Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Provisioning Logo

Provisioning

Provisioning Platform Documentation

Welcome to the Provisioning Platform documentation. This is an enterprise-grade Infrastructure as Code (IaC) platform built with Rust, Nushell, and Nickel.

What is Provisioning

Provisioning is a comprehensive infrastructure automation platform that manages complete infrastructure lifecycles across multiple cloud providers. The platform emphasizes type-safety, configuration-driven design, and workspace-first organization.

Key Features

  • Workspace Management: Default mode for organizing infrastructure, settings, schemas, and extensions
  • Type-Safe Configuration: Nickel-based configuration system with validation and contracts
  • Multi-Cloud Support: Unified interface for AWS, UpCloud, and local providers
  • Modular CLI Architecture: 111+ commands with 84% code reduction through modularity
  • Batch Workflow Engine: Orchestrate complex multi-cloud operations
  • Complete Security System: Authentication, authorization, encryption, and compliance
  • Extensible Architecture: Custom providers, task services, and plugins

Getting Started

New users should start with:

  1. Prerequisites - System requirements and dependencies
  2. Installation - Install the platform
  3. Quick Start - 5-minute deployment tutorial
  4. First Deployment - Comprehensive walkthrough

Documentation Structure

  • Getting Started: Installation and initial setup
  • User Guides: Workflow tutorials and best practices
  • Infrastructure as Code: Nickel configuration and schema reference
  • Platform Features: Core capabilities and systems
  • Operations: Deployment, monitoring, and maintenance
  • Security: Complete security system documentation
  • Development: Extension and plugin development
  • API Reference: REST API and CLI command reference
  • Architecture: System design and ADRs
  • Examples: Practical use cases and patterns
  • Troubleshooting: Problem-solving guides

Core Technologies

  • Rust: Platform services and performance-critical components
  • Nushell: Scripting, CLI, and automation
  • Nickel: Type-safe infrastructure configuration
  • SecretumVault: Secrets management integration

Workspace-First Approach

Provisioning uses workspaces as the default organizational unit. A workspace contains:

  • Infrastructure definitions (Nickel schemas)
  • Environment-specific settings
  • Custom extensions and providers
  • Deployment state and metadata

All operations work within workspace context, providing isolation and consistency.

Support and Community

  • Issues: Report bugs and request features on GitHub
  • Documentation: This documentation site
  • Examples: See the Examples section

License

See project LICENSE file for details.

Getting Started

Your journey to infrastructure automation starts here. This section guides you from zero to your first successful deployment in minutes.

Overview

Getting started with Provisioning involves:

  • Verifying prerequisites - System requirements, tools, cloud accounts
  • Installing platform - Binary or container installation
  • Initial configuration - Environment setup, credentials, workspaces
  • First deployment - Deploy actual infrastructure in 5 minutes
  • Verification - Validate everything is working correctly

By the end of this section, you’ll have a running Provisioning installation and have deployed your first infrastructure.

Quick Start Guides

Starting from Scratch

  • Prerequisites - System requirements (Nushell 0.109.1+, Docker/Podman optional), cloud account setup, tool installation.

  • Installation - Step-by-step installation: binary download, container, or source build with platform verification.

  • Quick Start - 5-minute guide: install → configure → deploy infrastructure (requires 5 minutes and your AWS/UpCloud credentials).

  • First Deployment - Deploy your first infrastructure: create workspace, configure provider, deploy resources, verify success.

  • Verification - Validate installation: check system health, test CLI commands, verify cloud integration, confirm resource creation.

What You’ll Learn

By completing this section, you’ll know how to:

  1. ✅ Install and configure Provisioning
  2. ✅ Create your first workspace
  3. ✅ Configure cloud providers (AWS, UpCloud, Hetzner, etc.)
  4. ✅ Write simple Nickel infrastructure definitions
  5. ✅ Deploy infrastructure using Provisioning
  6. ✅ Verify and manage deployed resources

Prerequisites Checklist

Before starting, verify you have:

  • Linux, macOS, or Windows with WSL2
  • Nushell 0.109.1 or newer (nu --version)
  • 2GB+ RAM and 100MB disk space
  • Internet connectivity
  • Cloud account (AWS, UpCloud, Hetzner, or local demo mode)
  • Access credentials or API tokens for cloud provider

Missing something? See Prerequisites for detailed instructions.

5-Minute Quick Start

If you’re impatient, here’s the ultra-quick path:

# 1. Install (2 minutes)
curl -fsSL  [https://provisioning.io/install.sh](https://provisioning.io/install.sh) | sh

# 2. Verify installation (30 seconds)
provisioning --version
provisioning status

# 3. Create workspace (30 seconds)
provisioning workspace create --name demo

# 4. Add cloud provider (1 minute)
provisioning config set --workspace demo \
  providers.aws.region us-east-1 \
  providers.aws.credentials_source aws_iam

# 5. Deploy infrastructure (1 minute)
provisioning deploy --workspace demo \
  --config examples/simple-instance.ncl

# 6. Verify (30 seconds)
provisioning resource list --workspace demo

For detailed walkthrough, see Quick Start.

Installation Methods

# Download and extract
curl -fsSL  [https://provisioning.io/provisioning-latest-linux.tar.gz](https://provisioning.io/provisioning-latest-linux.tar.gz) | tar xz
sudo mv provisioning /usr/local/bin/
provisioning --version

Option 2: Container

docker run -it provisioning/provisioning:latest \
  provisioning --version

Option 3: Build from Source

git clone  [https://github.com/provisioning/provisioning.git](https://github.com/provisioning/provisioning.git)
cd provisioning
cargo build --release
./target/release/provisioning --version

See Installation for detailed instructions.

Next Steps After Installation

  1. Read Quick Start - 5-minute walkthrough
  2. Complete First Deployment - Deploy real infrastructure
  3. Run Verification - Validate system health
  4. Move to Guides - Learn advanced features
  5. Explore Examples - Real-world scenarios

Common Questions

Q: How long does installation take? A: 5-10 minutes including cloud credential setup.

Q: What if I don’t have a cloud account? A: Try our demo provider in local mode - no cloud account needed.

Q: Can I use Provisioning offline? A: Yes, with local provider. Cloud operations require internet.

Q: What’s the learning curve? A: 30 minutes for basics, days to master advanced features.

Q: Where do I get help? A: See Getting Help or Troubleshooting.

Architecture Overview

Provisioning works in these steps:

1. Install Platform
   ↓
2. Create Workspace
   ↓
3. Add Cloud Provider Credentials
   ↓
4. Write Nickel Configuration
   ↓
5. Deploy Infrastructure
   ↓
6. Monitor & Manage

What’s Next

After getting started:

Getting Help

If you get stuck:

  1. Check Troubleshooting
  2. Review Guides for similar scenarios
  3. Search Examples for your use case
  4. Ask in community forums or open a GitHub issue
  • Full Guides → See provisioning/docs/src/guides/
  • Examples → See provisioning/docs/src/examples/
  • Architecture → See provisioning/docs/src/architecture/
  • Features → See provisioning/docs/src/features/
  • API Reference → See provisioning/docs/src/api-reference/

Prerequisites

Before installing the Provisioning platform, ensure your system meets the following requirements.

Required Software

Nushell 0.109.1+

Nushell is the primary shell and scripting environment for the platform.

Installation:

# macOS (Homebrew)
brew install nushell

# Linux (Cargo)
cargo install nu

# From source
git clone  [https://github.com/nushell/nushell](https://github.com/nushell/nushell)
cd nushell
cargo install --path .

Verify installation:

nu --version
# Should show: 0.109.1 or higher

Nickel 1.15.1+

Nickel is the infrastructure-as-code language providing type-safe configuration with lazy evaluation.

Installation:

# macOS (Homebrew)
brew install nickel

# Linux (Cargo)
cargo install nickel-lang-cli

# From source
git clone  [https://github.com/tweag/nickel](https://github.com/tweag/nickel)
cd nickel
cargo install --path cli

Verify installation:

nickel --version
# Should show: 1.15.1 or higher

SOPS 3.10.2+

SOPS (Secrets OPerationS) provides encrypted configuration and secrets management.

Installation:

# macOS (Homebrew)
brew install sops

# Linux (binary download)
wget  [https://github.com/getsops/sops/releases/download/v3.10.2/sops-v3.10.2.linux.amd64](https://github.com/getsops/sops/releases/download/v3.10.2/sops-v3.10.2.linux.amd64)
sudo mv sops-v3.10.2.linux.amd64 /usr/local/bin/sops
sudo chmod +x /usr/local/bin/sops

Verify installation:

sops --version
# Should show: 3.10.2 or higher

Age 1.2.1+

Age provides modern encryption for secrets used by SOPS.

Installation:

# macOS (Homebrew)
brew install age

# Linux (binary download)
wget  [https://github.com/FiloSottile/age/releases/download/v1.2.1/age-v1.2.1-linux-amd64.tar.gz](https://github.com/FiloSottile/age/releases/download/v1.2.1/age-v1.2.1-linux-amd64.tar.gz)
tar xzf age-v1.2.1-linux-amd64.tar.gz
sudo mv age/age /usr/local/bin/
sudo chmod +x /usr/local/bin/age

Verify installation:

age --version
# Should show: 1.2.1 or higher

K9s 0.50.6+

K9s provides a terminal UI for managing Kubernetes clusters.

Installation:

# macOS (Homebrew)
brew install derailed/k9s/k9s

# Linux (binary download)
wget  [https://github.com/derailed/k9s/releases/download/v0.50.6/k9s_Linux_amd64.tar.gz](https://github.com/derailed/k9s/releases/download/v0.50.6/k9s_Linux_amd64.tar.gz)
tar xzf k9s_Linux_amd64.tar.gz
sudo mv k9s /usr/local/bin/

Verify installation:

k9s version
# Should show: 0.50.6 or higher

Optional Software

mdBook

For building and serving local documentation.

# Install with Cargo
cargo install mdbook

# Verify
mdbook --version

Docker or Podman

Container runtime for test environments and local development.

# Docker (macOS)
brew install --cask docker

# Podman (Linux)
sudo apt-get install podman

# Verify
docker --version
# or
podman --version

Cargo (Rust)

Required for building platform services and native plugins.

# Install Rust and Cargo
curl --proto '=https' --tlsv1.2 -sSf  [https://sh.rustup.rs](https://sh.rustup.rs) | sh

# Verify
cargo --version

Git

Version control for workspace management and configuration.

# Most systems have Git pre-installed
git --version

# Install if needed (macOS)
brew install git

# Install if needed (Linux)
sudo apt-get install git

System Requirements

Minimum Hardware

Development Workstation:

  • CPU: 2 cores
  • RAM: 4 GB
  • Disk: 20 GB available space
  • Network: Internet connection for provider APIs

Production Control Plane:

  • CPU: 4 cores
  • RAM: 8 GB
  • Disk: 50 GB available space (SSD recommended)
  • Network: Stable internet connection, public IP optional

Supported Operating Systems

Primary Support:

  • macOS 12.0+ (Monterey or newer)
  • Linux distributions with kernel 5.0+
    • Ubuntu 20.04 LTS or newer
    • Debian 11 or newer
    • Fedora 35 or newer
    • RHEL 8 or newer

Limited Support:

  • Windows 10/11 via WSL2 (Windows Subsystem for Linux)

Network Requirements

Outbound Access:

  • HTTPS (443) to cloud provider APIs
  • HTTPS (443) to GitHub (for version updates)
  • SSH (22) for server management

Inbound Access (optional, for platform services):

  • Port 8080: HTTP API
  • Port 8081: MCP server
  • Port 5000: Orchestrator service

Cloud Provider Access

At least one cloud provider account with API credentials:

UpCloud:

  • API username and password
  • Account with sufficient quota for servers

AWS:

  • AWS Access Key ID and Secret Access Key
  • IAM permissions for EC2, VPC, EBS operations
  • Account with sufficient EC2 quota

Local Provider:

  • Docker or Podman installed
  • Sufficient local system resources

Permission Requirements

User Permissions

Standard User (recommended):

  • Read/write access to workspace directory
  • Ability to create symlinks for CLI installation
  • SSH key generation capability

Administrative Tasks (optional):

  • Installing CLI to /usr/local/bin (requires sudo)
  • Installing system-wide dependencies
  • Configuring system services

File System Permissions

# Workspace directory
chmod 755 ~/provisioning-workspace

# Configuration files
chmod 600 ~/.config/provisioning/user_config.yaml
chmod 600 ~/.ssh/provisioning_*

# Executable permissions for CLI
chmod +x /path/to/provisioning/core/cli/provisioning

Verification Checklist

Before proceeding to installation, verify all prerequisites:

# Check required tools
nu --version              # 0.109.1+
nickel --version          # 1.15.1+
sops --version            # 3.10.2+
age --version             # 1.2.1+
k9s version               # 0.50.6+

# Check optional tools
mdbook --version          # Latest
docker --version          # Latest
cargo --version           # Latest
git --version             # Latest

# Verify system resources
nproc                     # CPU cores (2+ minimum)
free -h                   # RAM (4GB+ minimum)
df -h ~                   # Disk space (20GB+ minimum)

# Test network connectivity
curl -I  [https://api.github.com](https://api.github.com)
curl -I  [https://hub.upcloud.com](https://hub.upcloud.com)  # UpCloud API
curl -I  [https://ec2.amazonaws.com](https://ec2.amazonaws.com)  # AWS API

Next Steps

Once all prerequisites are met, proceed to:

Installation

This guide covers installing the Provisioning platform on your system.

Prerequisites

Ensure all prerequisites are met before proceeding.

Installation Steps

Step 1: Clone the Repository

# Clone the provisioning repository
git clone  [https://github.com/your-org/project-provisioning](https://github.com/your-org/project-provisioning)
cd project-provisioning

Step 2: Add CLI to PATH

The CLI can be installed globally or run directly from the repository.

Option A: Symbolic Link (Recommended):

# Create symbolic link to /usr/local/bin
ln -sf "$(pwd)/provisioning/core/cli/provisioning" /usr/local/bin/provisioning

# Verify installation
provisioning version

Option B: PATH Environment Variable:

# Add to ~/.bashrc, ~/.zshrc, or ~/.config/nushell/env.nu
export PATH="$PATH:/path/to/project-provisioning/provisioning/core/cli"

# Reload shell configuration
source ~/.bashrc  # or ~/.zshrc

Option C: Direct Execution:

# Run directly from repository (no installation needed)
./provisioning/core/cli/provisioning version

Step 3: Verify Installation

# Check CLI is accessible
provisioning version

# Show environment configuration
provisioning env

# Display help
provisioning help

Expected output:

Provisioning Platform
CLI Version: (current version)
Nushell: 0.109.1+
Nickel: 1.15.1+

Step 4: Initialize Configuration

Generate default configuration files:

# Create user configuration directory
mkdir -p ~/.config/provisioning

# Initialize default user configuration (optional)
provisioning config init

This creates ~/.config/provisioning/user_config.yaml with sensible defaults.

Step 5: Configure Cloud Provider Credentials

Configure credentials for at least one cloud provider.

UpCloud:

# ~/.config/provisioning/user_config.yaml
providers:
  upcloud:
    username: "your-username"
    password: "your-password"  # Use SOPS for encryption in production
    default_zone: "de-fra1"

AWS:

# ~/.config/provisioning/user_config.yaml
providers:
  aws:
    access_key_id: "AKIA..."
    secret_access_key: "..."  # Use SOPS for encryption in production
    default_region: "us-east-1"

Local Provider (no credentials required):

# ~/.config/provisioning/user_config.yaml
providers:
  local:
    container_runtime: "docker"  # or "podman"

Use SOPS to encrypt sensitive configuration:

# Generate Age encryption key
age-keygen -o ~/.config/provisioning/age-key.txt

# Extract public key
export AGE_PUBLIC_KEY=$(grep "public key:" ~/.config/provisioning/age-key.txt | cut -d: -f2 | tr -d ' ')

# Create .sops.yaml configuration
cat > ~/.config/provisioning/.sops.yaml <<EOF
creation_rules:
  - path_regex: .*user_config\.yaml$
    age: $AGE_PUBLIC_KEY
EOF

# Encrypt configuration file
sops -e -i ~/.config/provisioning/user_config.yaml

Decrypting (automatic with SOPS):

# Set Age key path
export SOPS_AGE_KEY_FILE=~/.config/provisioning/age-key.txt

# SOPS will automatically decrypt when accessed
provisioning config show

Step 7: Validate Configuration

# Validate all configuration files
provisioning validate config

# Check provider connectivity
provisioning providers

# Show complete environment
provisioning allenv

Optional: Install Platform Services

Platform services provide additional capabilities like orchestration and web UI.

Orchestrator Service (Rust)

# Build orchestrator
cd provisioning/platform/orchestrator
cargo build --release

# Start orchestrator
./target/release/orchestrator --port 5000

Control Center (Web UI)

# Build control center
cd provisioning/platform/control-center
cargo build --release

# Start control center
./target/release/control-center --port 8080

Native Plugins (Performance)

Install Nushell plugins for 10-50x performance improvements:

# Build and register plugins
cd provisioning/core/plugins

# Auth plugin
cargo build --release --package nu_plugin_auth
nu -c "register target/release/nu_plugin_auth"

# KMS plugin
cargo build --release --package nu_plugin_kms
nu -c "register target/release/nu_plugin_kms"

# Orchestrator plugin
cargo build --release --package nu_plugin_orchestrator
nu -c "register target/release/nu_plugin_orchestrator"

# Verify plugins are registered
nu -c "plugin list"

Workspace Initialization

Create your first workspace for managing infrastructure:

# Initialize new workspace
provisioning workspace init my-project
cd my-project

# Verify workspace structure
ls -la

Expected workspace structure:

my-project/
├── infra/          # Infrastructure Nickel schemas
├── config/         # Workspace configuration
├── extensions/     # Custom extensions
└── runtime/        # Runtime data and state

Troubleshooting

Common Issues

CLI not found after installation:

# Verify symlink was created
ls -l /usr/local/bin/provisioning

# Check PATH includes /usr/local/bin
echo $PATH

# Try direct path
/usr/local/bin/provisioning version

Permission denied when creating symlink:

# Use sudo for system-wide installation
sudo ln -sf "$(pwd)/provisioning/core/cli/provisioning" /usr/local/bin/provisioning

# Or use user-local bin directory
mkdir -p ~/.local/bin
ln -sf "$(pwd)/provisioning/core/cli/provisioning" ~/.local/bin/provisioning
export PATH="$PATH:$HOME/.local/bin"

Nushell version mismatch:

# Check Nushell version
nu --version

# Update Nushell
brew upgrade nushell  # macOS
cargo install nu --force  # Linux

Nickel not found:

# Install Nickel
brew install nickel  # macOS
cargo install nickel-lang-cli  # Linux

# Verify
nickel --version

Verification

Confirm successful installation:

# Complete installation check
provisioning version      # CLI version
provisioning env          # Environment configuration
provisioning providers    # Available cloud providers
provisioning validate config  # Configuration validation
provisioning help         # Help system

Next Steps

Once installation is complete:

Quick Start

Deploy your first infrastructure in 5 minutes using the Provisioning platform.

Prerequisites

5-Minute Deployment

Step 1: Create Workspace (30 seconds)

# Initialize workspace
provisioning workspace init quickstart-demo
cd quickstart-demo

Workspace structure created:

quickstart-demo/
├── infra/       # Infrastructure definitions
├── config/      # Workspace configuration
├── extensions/  # Custom providers/taskservs
└── runtime/     # State and logs

Step 2: Define Infrastructure (1 minute)

Create a simple server configuration using Nickel:

# Create infrastructure schema
cat > infra/demo-server.ncl <<'EOF'
{
  metadata = {
    name = "demo-server"
    provider = "local"  # Use local provider for quick demo
    environment = "development"
  }

  infrastructure = {
    servers = [
      {
        name = "web-01"
        plan = "small"
        role = "web"
      }
    ]
  }

  services = {
    taskservs = ["containerd"]  # Simple container runtime
  }
}
EOF

Using UpCloud or AWS? Change provider:

metadata.provider = "upcloud"  # or "aws"

Step 3: Validate Configuration (30 seconds)

# Validate Nickel schema
nickel typecheck infra/demo-server.ncl

# Validate provisioning configuration
provisioning validate config

# Preview what will be created
provisioning server create --check --infra demo-server

Expected output:

Infrastructure Plan: demo-server
Provider: local
Servers to create: 1
  - web-01 (small, role: web)
Task services: containerd

Estimated resources:
  CPU: 2 cores
  RAM: 2 GB
  Disk: 10 GB

Step 4: Create Infrastructure (2 minutes)

# Create server
provisioning server create --infra demo-server --yes

# Monitor progress
provisioning server status web-01

Progress indicators:

Creating server: web-01...
  [████████████████████████] 100% - Server provisioned
  [████████████████████████] 100% - SSH configured
  [████████████████████████] 100% - Network ready

Server web-01 created successfully
IP Address: 10.0.1.10
Status: running

Step 5: Install Task Service (1 minute)

# Install containerd
provisioning taskserv create containerd --infra demo-server

# Verify installation
provisioning taskserv status containerd

Output:

Installing containerd on web-01...
  [████████████████████████] 100% - Dependencies resolved
  [████████████████████████] 100% - Containerd installed
  [████████████████████████] 100% - Service started
  [████████████████████████] 100% - Health check passed

Containerd v1.7.0 installed successfully

Step 6: Verify Deployment (30 seconds)

# SSH into server
provisioning server ssh web-01

# Inside server - verify containerd
sudo systemctl status containerd
sudo ctr version

# Exit server
exit

What You’ve Accomplished

In 5 minutes, you’ve:

  • Created a workspace for infrastructure management
  • Defined infrastructure using type-safe Nickel schemas
  • Validated configuration before deployment
  • Provisioned a server on your chosen provider
  • Installed and configured containerd
  • Verified the deployment

Common Workflows

List Resources

# List all servers
provisioning server list

# List task services
provisioning taskserv list

# Show workspace info
provisioning workspace info

Modify Infrastructure

# Edit infrastructure schema
nano infra/demo-server.ncl

# Validate changes
provisioning validate config --infra demo-server

# Apply changes
provisioning server update --infra demo-server

Cleanup

# Remove task service
provisioning taskserv delete containerd --infra demo-server

# Delete server
provisioning server delete web-01 --yes

# Remove workspace
cd ..
rm -rf quickstart-demo

Next Steps

Deploy Kubernetes

Ready for something more complex?

# infra/kubernetes-cluster.ncl
{
  metadata = {
    name = "k8s-cluster"
    provider = "upcloud"
  }

  infrastructure = {
    servers = [
      {name = "control-01", plan = "medium", role = "control"}
      {name = "worker-01", plan = "large", role = "worker"}
      {name = "worker-02", plan = "large", role = "worker"}
    ]
  }

  services = {
    taskservs = ["kubernetes", "cilium", "rook-ceph"]
  }
}
provisioning server create --infra kubernetes-cluster --yes
provisioning taskserv create kubernetes --infra kubernetes-cluster

Multi-Cloud Deployment

Deploy to multiple providers simultaneously:

# infra/multi-cloud.ncl
{
  batch_workflow = {
    operations = [
      {
        id = "aws-cluster"
        provider = "aws"
        servers = [{name = "aws-web-01", plan = "t3.medium"}]
      }
      {
        id = "upcloud-cluster"
        provider = "upcloud"
        servers = [{name = "upcloud-web-01", plan = "medium"}]
      }
    ]
  }
}
provisioning batch submit infra/multi-cloud.ncl

Use Interactive Guides

Access built-in guides for comprehensive walkthroughs:

# Quick command reference
provisioning sc

# Complete from-scratch guide
provisioning guide from-scratch

# Customization patterns
provisioning guide customize

Troubleshooting Quick Issues

Server creation fails

# Check provider connectivity
provisioning providers

# Validate credentials
provisioning validate config

# Enable debug mode
provisioning --debug server create --infra demo-server

Task service installation fails

# Check server connectivity
provisioning server ssh web-01

# Verify dependencies
provisioning taskserv check-deps containerd

# Retry installation
provisioning taskserv create containerd --infra demo-server --force

Configuration validation errors

# Check Nickel syntax
nickel typecheck infra/demo-server.ncl

# Show detailed validation errors
provisioning validate config --verbose

# View configuration
provisioning config show

Reference

Essential Commands

# Workspace management
provisioning workspace init <name>
provisioning workspace list
provisioning workspace switch <name>

# Server operations
provisioning server create --infra <name>
provisioning server list
provisioning server status <hostname>
provisioning server ssh <hostname>
provisioning server delete <hostname>

# Task service operations
provisioning taskserv create <service> --infra <name>
provisioning taskserv list
provisioning taskserv status <service>
provisioning taskserv delete <service>

# Configuration
provisioning config show
provisioning validate config
provisioning env

Quick Reference

# Shortcut for fastest reference
provisioning sc

Further Reading

First Deployment

Comprehensive walkthrough deploying production-ready infrastructure with the Provisioning platform.

Overview

This guide walks through deploying a complete Kubernetes cluster with storage and networking on a cloud provider. You’ll learn workspace management, Nickel schema structure, provider configuration, dependency resolution, and validation workflows.

Deployment Architecture

What we’ll build:

  • 3-node Kubernetes cluster (1 control plane, 2 workers)
  • Cilium CNI for networking
  • Rook-Ceph for persistent storage
  • Container runtime (containerd)
  • Automated dependency resolution
  • Health monitoring

Prerequisites

  • Platform installed
  • Cloud provider credentials configured (UpCloud or AWS recommended)
  • 30-60 minutes for complete deployment

Part 1: Workspace Setup

Create Workspace

# Initialize production workspace
provisioning workspace init production-k8s
cd production-k8s

# Verify structure
ls -la

Workspace contains:

production-k8s/
├── infra/       # Infrastructure Nickel schemas
├── config/      # Workspace configuration
├── extensions/  # Custom providers/taskservs
└── runtime/     # State and logs

Configure Workspace

# Edit workspace configuration
cat > config/provisioning-config.yaml <<'EOF'
workspace:
  name: production-k8s
  environment: production

defaults:
  provider: upcloud  # or aws
  region: de-fra1    # UpCloud Frankfurt
  ssh_key_path: ~/.ssh/provisioning_production

servers:
  default_plan: medium
  auto_backup: true

logging:
  level: info
  format: text
EOF

Part 2: Infrastructure Definition

Define Nickel Schema

Create infrastructure definition with type-safe Nickel:

# Create Kubernetes cluster schema
cat > infra/k8s-cluster.ncl <<'EOF'
{
  metadata = {
    name = "k8s-prod"
    provider = "upcloud"
    environment = "production"
    version = "1.0.0"
  }

  infrastructure = {
    servers = [
      {
        name = "k8s-control-01"
        plan = "medium"      # 4 CPU, 8 GB RAM
        role = "control"
        zone = "de-fra1"
        disk_size_gb = 50
        backup_enabled = true
      }
      {
        name = "k8s-worker-01"
        plan = "large"       # 8 CPU, 16 GB RAM
        role = "worker"
        zone = "de-fra1"
        disk_size_gb = 100
        backup_enabled = true
      }
      {
        name = "k8s-worker-02"
        plan = "large"
        role = "worker"
        zone = "de-fra1"
        disk_size_gb = 100
        backup_enabled = true
      }
    ]
  }

  services = {
    taskservs = [
      "containerd"     # Container runtime (dependency)
      "etcd"           # Key-value store (dependency)
      "kubernetes"     # Core orchestration
      "cilium"         # CNI networking
      "rook-ceph"      # Persistent storage
    ]
  }

  kubernetes = {
    version = "1.28.0"
    pod_cidr = "10.244.0.0/16"
    service_cidr = "10.96.0.0/12"
    container_runtime = "containerd"
    cri_socket = "/run/containerd/containerd.sock"
  }

  networking = {
    cni = "cilium"
    enable_network_policy = true
    enable_encryption = true
  }

  storage = {
    provider = "rook-ceph"
    replicas = 3
    storage_class = "ceph-rbd"
  }
}
EOF

Validate Schema

# Type-check Nickel schema
nickel typecheck infra/k8s-cluster.ncl

# Validate against provisioning contracts
provisioning validate config --infra k8s-cluster

Expected output:

Schema validation: PASSED
  - Syntax: Valid Nickel
  - Type safety: All contracts satisfied
  - Dependencies: Resolved (5 taskservs)
  - Provider: upcloud (credentials found)

Part 3: Preview and Validation

Preview Infrastructure

# Dry-run to see what will be created
provisioning server create --check --infra k8s-cluster

Output shows:

Infrastructure Plan: k8s-prod
Provider: upcloud
Region: de-fra1

Servers to create: 3
  - k8s-control-01 (medium, 4 CPU, 8 GB RAM, 50 GB disk)
  - k8s-worker-01 (large, 8 CPU, 16 GB RAM, 100 GB disk)
  - k8s-worker-02 (large, 8 CPU, 16 GB RAM, 100 GB disk)

Task services: 5 (with dependencies resolved)
  1. containerd (dependency for kubernetes)
  2. etcd (dependency for kubernetes)
  3. kubernetes
  4. cilium (requires kubernetes)
  5. rook-ceph (requires kubernetes)

Estimated monthly cost: $xxx.xx
Estimated deployment time: 15-20 minutes

WARNING: Production deployment - ensure backup enabled

Dependency Graph

# Visualize dependency resolution
provisioning taskserv dependencies kubernetes --graph

Shows:

kubernetes
├── containerd (required)
├── etcd (required)
└── cni (cilium) (soft dependency)

cilium
└── kubernetes (required)

rook-ceph
└── kubernetes (required)

Part 4: Server Provisioning

Create Servers

# Create all servers in parallel
provisioning server create --infra k8s-cluster --yes

Progress tracking:

Creating 3 servers...
  k8s-control-01: [████████████████████████] 100%
  k8s-worker-01:  [████████████████████████] 100%
  k8s-worker-02:  [████████████████████████] 100%

Servers created: 3/3
SSH configured: 3/3
Network ready: 3/3

Servers available:
  k8s-control-01: 94.237.x.x (running)
  k8s-worker-01:  94.237.x.x (running)
  k8s-worker-02:  94.237.x.x (running)

Verify Server Access

# Test SSH connectivity
provisioning server ssh k8s-control-01 -- uname -a

# Check all servers
provisioning server list

Part 5: Service Installation

Install Task Services

# Install all task services (automatic dependency resolution)
provisioning taskserv create kubernetes --infra k8s-cluster

Installation flow (automatic):

Resolving dependencies...
  containerd → etcd → kubernetes → cilium, rook-ceph

Installing task services: 5

[1/5] Installing containerd...
  k8s-control-01: [████████████████████████] 100%
  k8s-worker-01:  [████████████████████████] 100%
  k8s-worker-02:  [████████████████████████] 100%

[2/5] Installing etcd...
  k8s-control-01: [████████████████████████] 100%

[3/5] Installing kubernetes...
  Control plane init: [████████████████████████] 100%
  Worker join: [████████████████████████] 100%
  Cluster ready: [████████████████████████] 100%

[4/5] Installing cilium...
  CNI deployment: [████████████████████████] 100%
  Network policies: [████████████████████████] 100%

[5/5] Installing rook-ceph...
  Operator: [████████████████████████] 100%
  Cluster: [████████████████████████] 100%
  Storage class: [████████████████████████] 100%

All task services installed successfully

Verify Kubernetes Cluster

# SSH to control plane
provisioning server ssh k8s-control-01

# Check cluster status
kubectl get nodes
kubectl get pods --all-namespaces
kubectl get storageclass

Expected output:

NAME              STATUS   ROLES    AGE   VERSION
k8s-control-01    Ready    control-plane  5m   v1.28.0
k8s-worker-01     Ready    <none>   4m   v1.28.0
k8s-worker-02     Ready    <none>   4m   v1.28.0

NAMESPACE     NAME                                READY   STATUS
kube-system   cilium-xxxxx                        1/1     Running
kube-system   cilium-operator-xxxxx               1/1     Running
kube-system   etcd-k8s-control-01                 1/1     Running
rook-ceph     rook-ceph-operator-xxxxx            1/1     Running

NAME              PROVISIONER
ceph-rbd          rook-ceph.rbd.csi.ceph.com

Part 6: Deployment Verification

Health Checks

# Platform-level health check
provisioning cluster status k8s-cluster

# Individual service health
provisioning taskserv status kubernetes
provisioning taskserv status cilium
provisioning taskserv status rook-ceph

Test Application Deployment

# Deploy test application on K8s cluster
cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: test-pvc
spec:
  storageClassName: ceph-rbd
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 1Gi
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: test-nginx
spec:
  replicas: 2
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
      - name: nginx
        image: nginx:latest
        volumeMounts:
        - name: storage
          mountPath: /usr/share/nginx/html
      volumes:
      - name: storage
        persistentVolumeClaim:
          claimName: test-pvc
EOF

# Verify deployment
kubectl get deployment test-nginx
kubectl get pods -l app=nginx
kubectl get pvc test-pvc

Network Policy Test

# Verify Cilium network policies work
kubectl exec -it <pod-name> -- curl  [http://test-nginx](http://test-nginx)

Part 7: State Management

View State

# Show current workspace state
provisioning workspace info

# List all resources
provisioning server list
provisioning taskserv list

# Export state for backup
provisioning workspace export > k8s-cluster-state.json

Configuration Backup

# Backup workspace configuration
tar -czf k8s-cluster-backup.tar.gz infra/ config/ runtime/

# Store securely (encrypted)
sops -e k8s-cluster-backup.tar.gz > k8s-cluster-backup.tar.gz.enc

What You’ve Learned

This deployment demonstrated:

  • Workspace creation and configuration
  • Nickel schema structure for infrastructure-as-code
  • Type-safe configuration validation
  • Automatic dependency resolution
  • Multi-server provisioning
  • Task service installation with health checks
  • Kubernetes cluster deployment
  • Storage and networking configuration
  • Verification and testing workflows
  • State management and backup

Next Steps

Verification

Validate the Provisioning platform installation and infrastructure health.

Installation Verification

CLI and Core Tools

# Check CLI version
provisioning version

# Verify Nushell
nu --version  # 0.109.1+

# Verify Nickel
nickel --version  # 1.15.1+

# Check SOPS and Age
sops --version  # 3.10.2+
age --version   # 1.2.1+

# Verify K9s
k9s version  # 0.50.6+

Configuration Validation

# Validate all configuration files
provisioning validate config

# Check environment
provisioning env

# Show all configuration
provisioning allenv

Expected output:

Configuration validation: PASSED
  - User config: ~/.config/provisioning/user_config.yaml ✓
  - System defaults: provisioning/config/config.defaults.toml ✓
  - Provider credentials: configured ✓

Provider Connectivity

# List available providers
provisioning providers

# Test provider connection (UpCloud example)
provisioning provider test upcloud

# Test provider connection (AWS example)
provisioning provider test aws

Workspace Verification

Workspace Structure

# List workspaces
provisioning workspace list

# Show current workspace
provisioning workspace current

# Verify workspace structure
ls -la <workspace-name>/

Expected structure:

workspace-name/
├── infra/          # Infrastructure Nickel schemas
├── config/         # Workspace configuration
├── extensions/     # Custom extensions
└── runtime/        # State and logs

Workspace Configuration

# Show workspace configuration
provisioning config show

# Validate workspace-specific config
provisioning validate config --workspace <name>

Infrastructure Verification

Server Health

# List all servers
provisioning server list

# Check server status
provisioning server status <hostname>

# Test SSH connectivity
provisioning server ssh <hostname> -- echo "Connection successful"

Task Service Health

# List installed task services
provisioning taskserv list

# Check service status
provisioning taskserv status <service-name>

# Verify service health
provisioning taskserv health <service-name>

Cluster Health

For Kubernetes clusters:

# SSH to control plane
provisioning server ssh <control-hostname>

# Check cluster nodes
kubectl get nodes

# Check system pods
kubectl get pods -n kube-system

# Check cluster info
kubectl cluster-info

Platform Services Verification

Orchestrator Service

# Check orchestrator status
curl  [http://localhost:5000/health](http://localhost:5000/health)

# View orchestrator version
curl  [http://localhost:5000/version](http://localhost:5000/version)

# List active workflows
provisioning workflow list

Expected response:

{
  "status": "healthy",
  "version": "x.x.x",
  "uptime": "2h 15m"
}

Control Center

# Check control center
curl  [http://localhost:8080/health](http://localhost:8080/health)

# Access web UI
open  [http://localhost:8080](http://localhost:8080)  # macOS
xdg-open  [http://localhost:8080](http://localhost:8080)  # Linux

Native Plugins

# List registered plugins
nu -c "plugin list"

# Verify plugins loaded
nu -c "plugin use nu_plugin_auth; plugin use nu_plugin_kms; plugin use nu_plugin_orchestrator"

Security Verification

Secrets Management

# Verify SOPS configuration
cat ~/.config/provisioning/.sops.yaml

# Test encryption/decryption
echo "test secret" > /tmp/test-secret.txt
sops -e /tmp/test-secret.txt > /tmp/test-secret.enc
sops -d /tmp/test-secret.enc
rm /tmp/test-secret.*

SSH Keys

# Verify SSH keys exist
ls -la ~/.ssh/provisioning_*

# Test SSH key permissions
ls -l ~/.ssh/provisioning_* | awk '{print $1}'
# Should show: -rw------- (600)

Encrypted Configuration

# Verify user config encryption
file ~/.config/provisioning/user_config.yaml

# Should show: SOPS encrypted data or YAML

Troubleshooting Common Issues

CLI Not Found

# Check PATH
echo $PATH | tr ':' '
' | grep provisioning

# Verify symlink
ls -l /usr/local/bin/provisioning

# Try direct execution
/path/to/project-provisioning/provisioning/core/cli/provisioning version

Provider Authentication Fails

# Verify credentials are set
provisioning config show | grep -A5 providers

# Test with debug mode
provisioning --debug provider test <provider-name>

# Check network connectivity
ping -c 3 api.upcloud.com  # UpCloud
ping -c 3 ec2.amazonaws.com  # AWS

Nickel Schema Errors

# Type-check schema
nickel typecheck <schema-file>.ncl

# Validate with verbose output
provisioning validate config --verbose

# Format Nickel file
nickel fmt <schema-file>.ncl

Server SSH Fails

# Verify SSH key
ssh-add -l | grep provisioning

# Test direct SSH
ssh -i ~/.ssh/provisioning_rsa root@<server-ip>

# Check server status
provisioning server status <hostname>

Task Service Installation Fails

# Check dependencies
provisioning taskserv dependencies <service>

# Verify server has resources
provisioning server ssh <hostname> -- df -h
provisioning server ssh <hostname> -- free -h

# Enable debug mode
provisioning --debug taskserv create <service>

Health Check Checklist

Complete verification checklist:

# Core tools
[x] Nushell 0.109.1+
[x] Nickel 1.15.1+
[x] SOPS 3.10.2+
[x] Age 1.2.1+
[x] K9s 0.50.6+

# Configuration
[x] User config valid
[x] Provider credentials configured
[x] Workspace initialized

# Provider connectivity
[x] Provider API accessible
[x] Authentication successful

# Infrastructure (if deployed)
[x] Servers running
[x] SSH connectivity working
[x] Task services installed
[x] Cluster healthy

# Platform services (if running)
[x] Orchestrator responsive
[x] Control center accessible
[x] Plugins registered

# Security
[x] Secrets encrypted
[x] SSH keys secured
[x] Configuration protected

Performance Verification

Response Times

# CLI response time
time provisioning version

# Provider API response time
time provisioning provider test <provider>

# Orchestrator response time
time curl  [http://localhost:5000/health](http://localhost:5000/health)

Acceptable ranges:

  • CLI commands: <1 second
  • Provider API: <3 seconds
  • Orchestrator API: <100ms

Resource Usage

# Check system resources
htop  # Interactive process viewer

# Check disk usage
df -h

# Check memory usage
free -h

Next Steps

Once verification is complete:

Setup & Configuration

Post-installation configuration and system setup for the Provisioning platform.

Overview

After installation, setup configures your system and prepares workspaces for infrastructure deployment.

Setup encompasses three critical phases:

  1. Initial Setup - Environment detection, dependency verification, directory creation
  2. Workspace Setup - Create workspaces, configure providers, initialize schemas
  3. Configuration - Provider credentials, system settings, profiles, validation

This process validates prerequisites, detects your environment, and bootstraps your first workspace.

Quick Setup

Get up and running in 3 commands:

# 1. Complete initial setup (detects system, creates dirs, validates dependencies)
provisioning setup initial

# 2. Create first workspace (for your infrastructure)
provisioning workspace create --name production

# 3. Add cloud provider credentials (AWS, UpCloud, Hetzner, etc.)
provisioning config set --workspace production \
  extensions.providers.aws.enabled true \
  extensions.providers.aws.config.region us-east-1

# 4. Verify configuration is valid
provisioning validate config

Setup Process Explained

The setup system automatically:

  1. System Detection - Detects OS (Linux, macOS, Windows), CPU architecture, RAM, disk space
  2. Dependency Verification - Validates Nushell, Nickel, SOPS, Age, K9s installation
  3. Directory Structure - Creates ~/.provisioning/, ~/.config/provisioning/, workspace directories
  4. Configuration Creation - Initializes default configuration, security settings, profiles
  5. Workspace Bootstrap - Creates default workspace with basic configuration
  6. Health Checks - Validates installation, runs diagnostic tests

All steps are logged and can be verified with provisioning status.

Setup Configuration Guides

Starting Fresh

  • Initial Setup - First-time system setup: detection, validation, directory creation, default configuration, health checks.

  • Workspace Setup - Create and initialize workspaces: creation, provider configuration, schema management, local customization.

  • Configuration Management - Configure system: providers, credentials, profiles, environment variables, validation rules.

Setup Profiles

Pre-configured setup profiles for different use cases:

Developer Profile

provisioning setup profile --profile developer
# Configures for local development with demo provider

Production Profile

provisioning setup profile --profile production
# Configures for production with security hardening

Custom Profile

provisioning setup profile --custom
# Interactive setup with customization

Directory Structure Created

Setup creates this directory structure:

~/.provisioning/
├── workspaces/           # Workspace data
├── cache/                # Build and dependency cache
├── plugins/              # Installed Nushell plugins
└── detectors/            # Custom detectors

~/.config/provisioning/
├── config.toml          # Main configuration
├── providers/           # Provider credentials
├── secrets/             # Encrypted secrets (via SOPS)
└── profiles/            # Setup profiles

Quick Setup Verification

# Check system status
provisioning status

# Verify all dependencies
provisioning setup verify-dependencies

# Test cloud provider connection
provisioning provider test --name aws

# Validate configuration
provisioning validate config

# Run health checks
provisioning health check

Environment-Specific Setup

For Single Workspace (Simple)

  1. Run Initial Setup
  2. Create one workspace
  3. Configure provider
  4. Done!

For Multiple Workspaces (Team)

  1. Run Initial Setup
  2. Create multiple workspaces per team
  3. Configure shared providers
  4. Set up workspace-specific schemas

For Multi-Cloud (Enterprise)

  1. Run Initial Setup with production profile
  2. Create workspace per environment (dev, staging, prod)
  3. Configure multiple cloud providers
  4. Enable audit logging and security features

Configuration Hierarchy

Configurations load in priority order:

1. Command-line arguments       (highest)
2. Environment variables        (PROVISIONING_*)
3. User profile config         (~/.config/provisioning/)
4. Workspace config            (workspace/config/)
5. System defaults             (provisioning/config/)
                               (lowest)

Common Setup Tasks

Add a Cloud Provider

provisioning config set --workspace production \
  extensions.providers.aws.config.region us-east-1 \
  extensions.providers.aws.config.credentials_source aws_iam

Configure Secrets Storage

provisioning config set \
  security.secrets.backend secretumvault \
  security.secrets.url  [http://localhost:8200](http://localhost:8200)

Enable Audit Logging

provisioning config set \
  security.audit.enabled true \
  security.audit.retention_days 2555

Set Up Multi-Tenancy

# Create separate workspaces per tenant
provisioning workspace create --name tenant-1
provisioning workspace create --name tenant-2

# Each workspace has isolated configuration

Setup Validation

After setup, validate everything works:

# Run complete validation suite
provisioning setup validate-all

# Or check specific components
provisioning setup validate-system       # OS, dependencies
provisioning setup validate-directories  # Directory structure
provisioning setup validate-config       # Configuration syntax
provisioning setup validate-providers    # Cloud provider connectivity
provisioning setup validate-security     # Security settings

Troubleshooting Setup

If setup fails:

  1. Check logs - provisioning setup logs --tail 20
  2. Verify dependencies - provisioning setup verify-dependencies
  3. Reset configuration - provisioning setup reset --workspace <name>
  4. Run diagnostics - provisioning diagnose setup
  5. Check documentation - See Troubleshooting

Next Steps After Setup

After initial setup completes:

  1. Create workspaces - See Workspace Setup
  2. Configure providers - See Configuration Management
  3. Deploy infrastructure - See Getting Started
  4. Learn features - See Features
  5. Explore examples - See Examples
  • Getting Started → See provisioning/docs/src/getting-started/
  • Features → See provisioning/docs/src/features/
  • Configuration Guide → See provisioning/docs/src/infrastructure/
  • Troubleshooting → See provisioning/docs/src/troubleshooting/

Initial Setup

Configure Provisioning after installation.

Overview

Initial setup validates your environment and prepares Provisioning for workspace creation. The setup process performs system detection, dependency verification, and configuration initialization.

Prerequisites

Before initial setup, ensure:

  1. Provisioning CLI installed and in PATH
  2. Nushell 0.109.0+ installed
  3. Nickel installed
  4. SOPS 3.10.2+ installed
  5. Age 1.2.1+ installed
  6. K9s 0.50.6+ installed (for Kubernetes)

Verify installation:

provisioning version
nu --version
nickel --version
sops --version
age --version

Setup Profiles

Provisioning provides configuration profiles for different use cases:

1. Developer Profile

For local development and testing:

provisioning setup profile --profile developer

Includes:

  • Local provider (simulation environment)
  • Development workspace
  • Test environment configuration
  • Debug logging enabled
  • No MFA required
  • Workspace directory: ~/.provisioning-dev/

2. Production Profile

For production deployments:

provisioning setup profile --profile production

Includes:

  • Encrypted configuration
  • Strict validation rules
  • MFA enabled
  • Audit logging enabled
  • Workspace directory: /opt/provisioning/

3. CI/CD Profile

For unattended automation:

provisioning setup profile --profile cicd

Includes:

  • Headless mode (no TUI prompts)
  • Service account authentication
  • Automated backups
  • Policy enforcement
  • Unattended upgrade support

Configuration Detection

The setup system automatically detects:

# System detection
OS:            $(uname -s)
CPU:           $(lscpu | grep 'CPU(s)' | awk '{print $NF}')
RAM:           $(free -h | grep Mem | awk '{print $2}')
Architecture:  $(uname -m)

The system adapts configuration based on detected resources:

Detected ResourceConfiguration
2-4 CPU coresSolo (single-instance) mode
4-8 CPU coresMultiUser mode (small cluster)
8+ CPU coresCICD or Enterprise mode
4GB RAMMinimal services only
8GB RAMStandard setup
16GB+ RAMFull feature set

Setup Steps

Step 1: Validate Environment

provisioning setup validate

Checks:

  • ✅ All dependencies installed
  • ✅ Permission levels
  • ✅ Network connectivity
  • ✅ Disk space (minimum 20GB recommended)

Step 2: Initialize Configuration

provisioning setup init

Creates:

  • ~/.config/provisioning/ - User configuration directory
  • ~/.config/provisioning/user_config.yaml - User settings
  • ~/.provisioning/workspaces/ - Workspace registry

Step 3: Configure Providers

provisioning setup providers

Interactive configuration for:

  • UpCloud (API key, endpoint)
  • AWS (Access key, secret, region)
  • Hetzner (API token)
  • Local (No configuration required)

Store credentials securely:

# Credentials are encrypted with SOPS + Age
~/.config/provisioning/.secrets/providers.enc.yaml

Step 4: Configure Security

provisioning setup security

Sets up:

  • JWT secret for authentication
  • KMS backend (local, Cosmian, AWS KMS)
  • Encryption keys
  • Certificate authorities

Step 5: Verify Installation

provisioning verify

Checks:

  • ✅ All components running
  • ✅ Provider connectivity
  • ✅ Configuration validity
  • ✅ Security systems operational

User Configuration

User configuration is stored in ~/.config/provisioning/user_config.yaml:

# User preferences
user:
  name: "Your Name"
  email: "[your@email.com](mailto:your@email.com)"
  default_region: "us-east-1"

# Workspace settings
workspaces:
  active: "my-project"
  directory: "~/.provisioning/workspaces/"
  registry:
    my-project:
      path: "/home/user/.provisioning/workspaces/workspace_my_project"
      created: "2026-01-16T10:30:00Z"
      template: "default"

# Provider defaults
providers:
  default: "upcloud"
  upcloud:
    endpoint: " [https://api.upcloud.com"](https://api.upcloud.com")
  aws:
    region: "us-east-1"

# Security settings
security:
  mfa_enabled: false
  kms_backend: "local"
  encryption: "aes-256-gcm"

# Display options
ui:
  theme: "dark"
  table_format: "compact"
  colors: true

# Logging
logging:
  level: "info"
  output: "console"
  file: "~/.provisioning/logs/provisioning.log"

Environment Variables

Override settings with environment variables:

# Provider selection
export PROVISIONING_PROVIDER=aws

# Workspace selection
export PROVISIONING_WORKSPACE=my-project

# Logging
export PROVISIONING_LOG_LEVEL=debug

# Configuration path
export PROVISIONING_CONFIG=~/.config/provisioning/

# KMS endpoint
export PROVISIONING_KMS_ENDPOINT= [http://localhost:8080](http://localhost:8080)

Troubleshooting

Missing Dependencies

# Install missing tools
brew install nushell nickel sops age k9s

# Verify
provisioning setup validate

Permission Errors

# Fix directory permissions
chmod 700 ~/.config/provisioning/
chmod 600 ~/.config/provisioning/user_config.yaml

Provider Connection Failed

# Test provider connectivity
provisioning providers test upcloud --verbose

# Verify credentials
cat ~/.config/provisioning/.secrets/providers.enc.yaml

Next Steps

After initial setup:

  1. Create workspace
  2. Configure infrastructure
  3. Deploy first cluster

Workspace Setup

Create and initialize your first Provisioning workspace.

Overview

A workspace is the default organizational unit for all infrastructure work in Provisioning. It groups infrastructure definitions, configurations, extensions, and runtime data in an isolated environment.

Workspace Structure

Every workspace follows a consistent directory structure:

workspace_my_project/
├── config/                     # Workspace configuration
│   ├── workspace.ncl           # Workspace definition (Nickel)
│   ├── provisioning.yaml       # Workspace metadata
│   ├── dev-defaults.toml       # Development environment settings
│   ├── test-defaults.toml      # Testing environment settings
│   └── prod-defaults.toml      # Production environment settings
│
├── infra/                      # Infrastructure definitions
│   ├── servers.ncl             # Server configurations
│   ├── clusters.ncl            # Cluster definitions
│   ├── networks.ncl            # Network configurations
│   └── batch-workflows.ncl     # Batch workflow definitions
│
├── extensions/                 # Workspace-specific extensions (optional)
│   ├── providers/              # Custom providers
│   ├── taskservs/              # Custom task services
│   ├── clusters/               # Custom cluster templates
│   └── workflows/              # Custom workflow definitions
│
└── runtime/                    # Runtime data (gitignored)
    ├── state/                  # Infrastructure state files
    ├── checkpoints/            # Workflow checkpoints
    ├── logs/                   # Operation logs
    └── generated/              # Generated configuration files

Creating a Workspace

Method 1: From Built-in Template

# Create from default template
provisioning workspace init my-project

# Create from specific template
provisioning workspace init my-k8s --template kubernetes-ha

# Create with custom path
provisioning workspace init my-project --path /custom/location

Method 2: From Git Repository

# Clone infrastructure repository
git clone  [https://github.com/org/infra-repo.git](https://github.com/org/infra-repo.git) my-infra
cd my-infra

# Import as workspace
provisioning workspace init . --import

Available Templates

Provisioning includes templates for common use cases:

TemplateDescriptionUse Case
defaultMinimal structureGeneral-purpose infrastructure
kubernetes-haHA Kubernetes (3 control planes)Production Kubernetes deployments
developmentDev-optimized with Docker ComposeLocal testing and development
multi-cloudMultiple provider configsMulti-cloud deployments
database-clusterDatabase-focusedDatabase infrastructure
cicdCI/CD pipeline configsAutomated deployment pipelines

List available templates:

provisioning workspace templates

# Show template details
provisioning workspace template show kubernetes-ha

Switching Workspaces

List All Workspaces

provisioning workspace list

# Example output:
NAME              PATH                           LAST_USED          STATUS
my-project        ~/.provisioning/workspace_my   2026-01-16 10:30   Active
dev-env           ~/.provisioning/workspace_dev  2026-01-15 15:45
production        ~/.provisioning/workspace_prod 2026-01-10 09:00

Switch to a Workspace

# Switch workspace
provisioning workspace switch my-project

# Verify switch
provisioning workspace status

# Quick switch (shortcut)
provisioning ws switch dev-env

When you switch workspaces:

  • Active workspace marker updates in user configuration
  • Environment variables update for current session
  • CLI prompt changes (if configured)
  • Last-used timestamp updates

Workspace Registry

The workspace registry is stored in user configuration:

# ~/.config/provisioning/user_config.yaml
workspaces:
  active: my-project
  registry:
    my-project:
      path: ~/.provisioning/workspaces/workspace_my_project
      created: 2026-01-16T10:30:00Z
      last_used: 2026-01-16T14:20:00Z
      template: default

Configuring Workspace

Workspace Definition (workspace.ncl)

# workspace.ncl - Workspace configuration

{
  # Workspace metadata
  name = "my-project"
  description = "My infrastructure project"
  version = "1.0.0"

  # Environment settings
  environment = 'production

  # Default provider
  provider = "upcloud"

  # Region preferences
  region = "de-fra1"

  # Workspace-specific providers (override defaults)
  providers = {
    upcloud = {
      endpoint = " [https://api.upcloud.com"](https://api.upcloud.com")
      region = "de-fra1"
    }
    aws = {
      region = "us-east-1"
    }
  }

  # Extensions (inherit from provisioning/extensions/)
  extensions = {
    providers = ["upcloud", "aws"]
    taskservs = ["kubernetes", "docker", "postgres"]
    clusters = ["web", "oci-reg"]
  }
}

Environment-Specific Configuration

Create environment-specific configuration files:

# Development environment
config/dev-defaults.toml:
[server]
plan = "small"
backup_enabled = false

# Production environment
config/prod-defaults.toml:
[server]
plan = "large"
backup_enabled = true
monitoring_enabled = true

Use environment selection:

# Deploy to development
PROVISIONING_ENV=dev provisioning server create

# Deploy to production (stricter validation)
PROVISIONING_ENV=prod provisioning server create --validate

Workspace Metadata (provisioning.yaml)

name: "my-project"
version: "1.0.0"
created: "2026-01-16T10:30:00Z"
owner: "team-infra"

# Provider configuration
providers:
  default: "upcloud"
  upcloud:
    api_endpoint: " [https://api.upcloud.com"](https://api.upcloud.com")
    region: "de-fra1"
  aws:
    region: "us-east-1"

# Workspace features
features:
  workspace_switching: true
  batch_workflows: true
  test_environment: true
  security_system: true

# Validation rules
validation:
  strict: true
  check_dependencies: true
  validate_certificates: true

# Backup settings
backup:
  enabled: true
  frequency: "daily"
  retention_days: 30

Initializing Infrastructure

Step 1: Create Infrastructure Definition

Create infra/servers.ncl:

let defaults = import "defaults.ncl" in

{
  servers = [
    defaults.make_server {
      name = "web-01"
      plan = "medium"
      region = "de-fra1"
    }
    defaults.make_server {
      name = "db-01"
      plan = "large"
      region = "de-fra1"
      backup_enabled = true
    }
  ]
}

Step 2: Validate Configuration

# Validate Nickel configuration
nickel typecheck infra/servers.ncl

# Export and validate
nickel export infra/servers.ncl | provisioning validate config

# Verbose validation
provisioning validate config --verbose

Step 3: Export Configuration

# Export Nickel to TOML (generated output)
nickel export --format toml infra/servers.ncl > infra/servers.toml

# The .toml files are auto-generated, don't edit directly

Workspace Security

Securing Credentials

Credentials are encrypted with SOPS + Age:

# Initialize secrets
provisioning sops init

# Create encrypted secrets file
provisioning sops create .secrets/providers.enc.yaml

# Encrypt existing credentials
sops -e -i infra/credentials.toml

Git Workflow

Version control best practices:

# COMMIT (shared with team)
infra/**/*.ncl              # Infrastructure definitions
config/*.toml               # Environment configurations
config/provisioning.yaml    # Workspace metadata
extensions/**/*             # Custom extensions

# GITIGNORE (never commit)
config/local-overrides.toml # Local user settings
runtime/**/*                # Runtime data and state
**/*.secret                 # Credential files
**/*.enc                    # Encrypted files (if not decrypted locally)

Multi-Workspace Strategies

Strategy 1: Separate Workspaces Per Environment

# Create dedicated workspaces
provisioning workspace init myapp-dev
provisioning workspace init myapp-staging
provisioning workspace init myapp-prod

# Each workspace is completely isolated
provisioning ws switch myapp-prod
provisioning server create  # Creates in prod only

Pros: Complete isolation, different credentials, independent state Cons: More workspace management, configuration duplication

Strategy 2: Single Workspace, Multiple Environments

# Single workspace with environment configs
provisioning workspace init myapp

# Deploy to different environments
PROVISIONING_ENV=dev provisioning server create
PROVISIONING_ENV=staging provisioning server create
PROVISIONING_ENV=prod provisioning server create

Pros: Shared configuration, easier maintenance Cons: Shared credentials, risk of cross-environment mistakes

Strategy 3: Hybrid Approach

# Dev workspace for experimentation
provisioning workspace init myapp-dev

# Prod workspace for production only
provisioning workspace init myapp-prod

# Use environment flags within workspaces
provisioning ws switch myapp-prod
PROVISIONING_ENV=prod provisioning cluster deploy

Pros: Balances isolation and convenience Cons: More complex to explain to teams

Workspace Validation

Before deploying infrastructure:

# Validate entire workspace
provisioning validate workspace

# Validate specific configuration
provisioning validate config --infra servers.ncl

# Validate with strict rules
provisioning validate config --strict

Troubleshooting

Workspace Not Found

# Re-register workspace
provisioning workspace register /path/to/workspace

# Or create new workspace
provisioning workspace init my-project

Permission Errors

# Fix workspace permissions
chmod 755 ~/.provisioning/workspaces/workspace_*
chmod 644 ~/.provisioning/workspaces/workspace_*/config/*

Configuration Validation Errors

# Check configuration syntax
nickel typecheck infra/*.ncl

# Inspect generated TOML
nickel export infra/*.ncl | jq '.'

# Debug configuration loading
provisioning config validate --verbose

Next Steps

  1. Configure infrastructure
  2. Deploy servers
  3. Create batch workflows

Configuration Management

Configure Provisioning providers, credentials, and system settings.

Overview

Provisioning uses a hierarchical configuration system with 5 layers of precedence. Configuration is type-safe via Nickel schemas and can be overridden at multiple levels.

Configuration Hierarchy

1. Runtime Arguments        (Highest Priority)
   ↓ (CLI flags: --provider upcloud)
2. Environment Variables
   ↓ (PROVISIONING_PROVIDER=upcloud)
3. Workspace Configuration
   ↓ (workspace/config/provisioning.yaml)
4. Environment Defaults
   ↓ (workspace/config/prod-defaults.toml)
5. System Defaults          (Lowest Priority)
   ├─ User Config (~/.config/provisioning/user_config.yaml)
   └─ Platform Defaults (provisioning/config/config.defaults.toml)

Configuration Sources

1. System Defaults

Built-in defaults for all Provisioning settings:

Location: provisioning/config/config.defaults.toml

# Default provider
[providers]
default = "local"

# Default server configuration
[server]
plan = "small"
region = "us-east-1"
zone = "a"
backup_enabled = false
monitoring = false

# Default workspace
[workspace]
directory = "~/.provisioning/workspaces/"

# Logging
[logging]
level = "info"
output = "console"

# Security
[security]
mfa_enabled = false
encryption = "aes-256-gcm"

2. User Configuration

User-level settings in home directory:

Location: ~/.config/provisioning/user_config.yaml

user:
  name: "Your Name"
  email: "[user@example.com](mailto:user@example.com)"

providers:
  default: "upcloud"
  upcloud:
    endpoint: " [https://api.upcloud.com"](https://api.upcloud.com")
    api_key: "${UPCLOUD_API_KEY}"
  aws:
    region: "us-east-1"
    profile: "default"

workspace:
  directory: "~/.provisioning/workspaces/"
  default: "my-project"

logging:
  level: "info"
  file: "~/.provisioning/logs/provisioning.log"

3. Workspace Configuration

Workspace-specific settings:

Location: workspace/config/provisioning.yaml

name: "my-project"
environment: "production"

providers:
  default: "upcloud"
  upcloud:
    region: "de-fra1"
    endpoint: " [https://api.upcloud.com"](https://api.upcloud.com")

validation:
  strict: true
  require_approval: false

4. Environment Defaults

Environment-specific configuration files:

Files:

  • workspace/config/dev-defaults.toml - Development
  • workspace/config/test-defaults.toml - Testing
  • workspace/config/prod-defaults.toml - Production

Example prod-defaults.toml:

# Production environment overrides
[server]
plan = "large"
backup_enabled = true
monitoring = true
high_availability = true

[security]
mfa_enabled = true
require_approval = true

[workspace]
require_version_tag = true
require_changelog = true

5. Runtime Arguments

Command-line flags with highest priority:

# Override provider
provisioning --provider aws server create

# Override configuration
provisioning --config /custom/config.yaml

# Override environment
provisioning --env production

# Combined
provisioning --provider aws --env production --format json server list

Provider Configuration

Supported Providers

ProviderStatusConfiguration
UpCloud✅ ActiveAPI endpoint, credentials
AWS✅ ActiveRegion, access keys, profile
Hetzner✅ ActiveAPI token, datacenter
Local✅ ActiveDirectory path (no credentials)

Configuring UpCloud

Interactive setup:

provisioning setup providers

Or manually in ~/.config/provisioning/user_config.yaml:

providers:
  default: "upcloud"
  upcloud:
    endpoint: " [https://api.upcloud.com"](https://api.upcloud.com")
    api_key: "${UPCLOUD_API_KEY}"
    api_secret: "${UPCLOUD_API_SECRET}"

Store credentials securely:

# Set environment variables
export UPCLOUD_API_KEY="your-api-key"
export UPCLOUD_API_SECRET="your-api-secret"

# Or use SOPS for encrypted storage
provisioning sops set providers.upcloud.api_key "your-api-key"

Configuring AWS

providers:
  aws:
    region: "us-east-1"
    access_key_id: "${AWS_ACCESS_KEY_ID}"
    secret_access_key: "${AWS_SECRET_ACCESS_KEY}"
    profile: "default"

Set environment variables:

export AWS_ACCESS_KEY_ID="your-access-key"
export AWS_SECRET_ACCESS_KEY="your-secret-key"
export AWS_REGION="us-east-1"

Configuring Hetzner

providers:
  hetzner:
    api_token: "${HETZNER_API_TOKEN}"
    datacenter: "nbg1-dc3"

Set environment:

export HETZNER_API_TOKEN="your-api-token"

Testing Provider Connectivity

# Test provider connectivity
provisioning providers test upcloud

# Verbose output
provisioning providers test aws --verbose

# Test all configured providers
provisioning providers test --all

Global Configuration Accessors

Provisioning provides 476+ configuration accessors for accessing settings:

# Access configuration values
let config = (provisioning config load)

# Provider settings
$config.providers.default
$config.providers.upcloud.endpoint
$config.providers.aws.region

# Workspace settings
$config.workspace.directory
$config.workspace.default

# Server defaults
$config.server.plan
$config.server.region
$config.server.backup_enabled

# Security settings
$config.security.mfa_enabled
$config.security.encryption

Credential Management

Encrypted Credentials

Use SOPS + Age for encrypted secrets:

# Initialize SOPS configuration
provisioning sops init

# Create encrypted credentials file
provisioning sops create .secrets/providers.enc.yaml

# Edit encrypted file
provisioning sops edit .secrets/providers.enc.yaml

# Decrypt for local use
provisioning sops decrypt .secrets/providers.enc.yaml > .secrets/providers.toml

Using Environment Variables

Override credentials at runtime:

# Provider credentials
export PROVISIONING_PROVIDER=aws
export AWS_ACCESS_KEY_ID="your-key"
export AWS_SECRET_ACCESS_KEY="your-secret"
export AWS_REGION="us-east-1"

# Execute command
provisioning server create

KMS Integration

For enterprise deployments, use KMS backends:

# Configure KMS backend
provisioning kms init --backend cosmian

# Store credentials in KMS
provisioning kms set providers.upcloud.api_key "value"

# Decrypt on-demand
provisioning kms get providers.upcloud.api_key

Configuration Validation

Validate Configuration

# Validate all configuration
provisioning validate config

# Validate specific section
provisioning validate config --section providers

# Strict validation
provisioning validate config --strict

# Verbose output
provisioning validate config --verbose

Validate Infrastructure

# Validate infrastructure schemas
provisioning validate infra

# Validate specific file
provisioning validate infra workspace/infra/servers.ncl

# Type-check with Nickel
nickel typecheck workspace/infra/servers.ncl

Configuration Merging

Configuration is merged from all layers respecting priority:

# View final merged configuration
provisioning config show

# Export merged configuration
provisioning config export --format yaml

# Show configuration source
provisioning config debug --keys providers.default

Working with Configurations

Export Configuration

# Export as YAML
provisioning config export --format yaml > config.yaml

# Export as JSON
provisioning config export --format json | jq '.'

# Export as TOML
provisioning config export --format toml > config.toml

Import Configuration

# Import from file
provisioning config import --file config.yaml

# Merge with existing
provisioning config merge --file config.yaml

Reset Configuration

# Reset to defaults
provisioning config reset

# Reset specific section
provisioning config reset --section providers

# Backup before reset
provisioning config backup

Environment Variables

Common environment variables for overriding configuration:

# Provider selection
export PROVISIONING_PROVIDER=upcloud
export PROVISIONING_PROVIDER_UPCLOUD_ENDPOINT= [https://api.upcloud.com](https://api.upcloud.com)

# Workspace
export PROVISIONING_WORKSPACE=my-project
export PROVISIONING_WORKSPACE_DIRECTORY=~/.provisioning/workspaces/

# Environment
export PROVISIONING_ENV=production

# Logging
export PROVISIONING_LOG_LEVEL=debug
export PROVISIONING_LOG_FILE=~/.provisioning/logs/provisioning.log

# Configuration path
export PROVISIONING_CONFIG=~/.config/provisioning/

# KMS endpoint
export PROVISIONING_KMS_ENDPOINT= [http://localhost:8080](http://localhost:8080)

# Feature flags
export PROVISIONING_FEATURE_BATCH_WORKFLOWS=true
export PROVISIONING_FEATURE_TEST_ENVIRONMENT=true

Best Practices

1. Secure Credentials

# NEVER commit credentials
echo "config/local-overrides.toml" >> .gitignore
echo ".secrets/" >> .gitignore

# Use SOPS for shared secrets
provisioning sops encrypt config/credentials.toml
git add config/credentials.enc.toml

# Use environment variables for local overrides
export PROVISIONING_PROVIDER_UPCLOUD_API_KEY="your-key"

2. Environment-Specific Configuration

# Development uses different credentials
PROVISIONING_ENV=dev provisioning workspace switch myapp-dev

# Production uses restricted credentials
PROVISIONING_ENV=prod provisioning workspace switch myapp-prod

3. Configuration Documentation

Document your configuration choices:

# provisioning.yaml
configuration:
  provider: "upcloud"
  reason: "Primary European cloud"

  backup_strategy: "daily"
  reason: "Compliance requirement"

  monitoring: "enabled"
  reason: "SLA monitoring"

4. Regular Validation

# Validate before deployment
provisioning validate config --strict

# Export and inspect
provisioning config export --format yaml | less

# Test provider connectivity
provisioning providers test --all

Troubleshooting

Configuration Not Loading

# Check configuration file
cat ~/.config/provisioning/user_config.yaml

# Validate YAML syntax
yamllint ~/.config/provisioning/user_config.yaml

# Debug configuration loading
provisioning config show --verbose

Provider Connection Failed

# Check provider configuration
provisioning config show --section providers

# Test connectivity
provisioning providers test upcloud --verbose

# Check credentials
provisioning kms get providers.upcloud.api_key

Environment Variable Conflicts

# Check environment variables
env | grep PROVISIONING

# Unset conflicting variables
unset PROVISIONING_PROVIDER

# Set correct values
export PROVISIONING_PROVIDER=aws
export AWS_REGION=us-east-1

Next Steps

  1. Create workspace
  2. Deploy infrastructure
  3. Configure batch workflows

Provisioning Logo

Provisioning

User Guides

Step-by-step guides for common workflows, best practices, and advanced operational scenarios using the Provisioning platform.

Overview

This section provides practical guides for:

  • Getting started - From-scratch deployment and initial setup
  • Organization - Workspace management and multi-cloud strategies
  • Automation - Advanced workflow orchestration and GitOps
  • Operations - Disaster recovery, secrets rotation, cost governance
  • Integration - Hybrid cloud setup, zero-trust networks, legacy migration
  • Scaling - Multi-tenant environments, high availability, performance optimization

Each guide includes step-by-step instructions, configuration examples, troubleshooting, and best practices.

Getting Started

I’m completely new to Provisioning

Start with: From Scratch Guide - Complete walkthrough from installation through first deployment with explanations and examples.

I want to organize infrastructure

Read: Workspace Management - Best practices for organizing workspaces, isolation, and multi-team setup.

Core Workflow Guides

Multi-Cloud Deployment AWS UpCloud Hetzner Provider Abstraction

Deployment Pipeline Dev Staging Canary Production with Validation Gates

Advanced Operational Guides

Enterprise Features

Quick Navigation

I need to

Deploy infrastructure quicklyFrom Scratch Guide

Organize multiple workspacesWorkspace Management

Deploy across cloudsMulti-Cloud Deployment

Build complex workflowsAdvanced Workflow Orchestration

Set up GitOpsGitOps Infrastructure Deployment

Handle disastersDisaster Recovery Guide

Rotate secrets safelySecrets Rotation Strategy

Connect on-premise to cloudHybrid Cloud Deployment

Design secure networksAdvanced Networking

Build custom extensionsCustom Extensions

Migrate legacy systemsLegacy System Migration

Guide Structure

Each guide follows this pattern:

  1. Overview - What you’ll accomplish
  2. Prerequisites - What you need before starting
  3. Architecture - Visual diagram of the solution
  4. Step-by-Step - Detailed instructions with examples
  5. Configuration - Full Nickel configuration examples
  6. Verification - How to validate the deployment
  7. Troubleshooting - Common issues and solutions
  8. Next Steps - How to extend or customize
  9. Best Practices - Lessons learned and recommendations

Learning Paths

Path 1: I’m new to Provisioning (Day 1)

  1. From Scratch Guide - Basic setup
  2. Workspace Management - Organization
  3. Multi-Cloud Deployment - Multi-cloud

Path 2: I need production-ready setup (Week 1)

  1. Workspace Management - Organization
  2. GitOps Infrastructure Deployment - Automation
  3. Disaster Recovery Guide - Resilience
  4. Secrets Rotation Strategy - Security
  5. Advanced Networking - Enterprise networking

Path 3: I’m migrating from legacy (Month-long project)

  1. Legacy System Migration - Migration plan
  2. Advanced Workflow Orchestration - Complex deployments
  3. Hybrid Cloud Deployment - Coexistence
  4. GitOps Infrastructure Deployment - Continuous deployment
  5. Disaster Recovery Guide - Failover strategies

Path 4: I’m building a platform (Team project)

  1. Custom Extensions - Build extensions
  2. Workspace Management - Multi-tenant setup
  3. Advanced Workflow Orchestration - Complex workflows
  4. GitOps Infrastructure Deployment - CD/GitOps
  5. Secrets Rotation Strategy - Security at scale
  • Getting Started → See provisioning/docs/src/getting-started/
  • Examples → See provisioning/docs/src/examples/
  • Features → See provisioning/docs/src/features/
  • Operations → See provisioning/docs/src/operations/
  • Development → See provisioning/docs/src/development/

From Scratch Guide

Complete walkthrough from zero to production-ready infrastructure deployment using the Provisioning platform. This guide covers installation, configuration, workspace setup, infrastructure definition, and deployment workflows.

Overview

This guide walks you through:

  • Installing prerequisites and the Provisioning platform
  • Configuring cloud provider credentials
  • Creating your first workspace
  • Defining infrastructure using Nickel
  • Deploying servers and task services
  • Setting up Kubernetes clusters
  • Implementing security best practices
  • Monitoring and maintaining infrastructure

Time commitment: 2-3 hours for complete setup Prerequisites: Linux or macOS, terminal access, cloud provider account (optional)

Phase 1: Installation

System Prerequisites

Ensure your system meets minimum requirements:

# Check OS (Linux or macOS)
uname -s

# Verify available disk space (minimum 10GB recommended)
df -h ~

# Check internet connectivity
ping -c 3 github.com

Install Required Tools

Nushell (Required)

# macOS
brew install nushell

# Linux
cargo install nu

# Verify installation
nu --version  # Expected: 0.109.1+

Nickel (Required)

# macOS
brew install nickel

# Linux
cargo install nickel-lang-cli

# Verify installation
nickel --version  # Expected: 1.15.1+

Additional Tools

# SOPS for secrets management
brew install sops  # macOS
# or download from  [https://github.com/getsops/sops/releases](https://github.com/getsops/sops/releases)

# Age for encryption
brew install age  # macOS
cargo install age  # Linux

# K9s for Kubernetes management (optional)
brew install derailed/k9s/k9s

# Verify installations
sops --version    # Expected: 3.10.2+
age --version     # Expected: 1.2.1+
k9s version       # Expected: 0.50.6+

Install Provisioning Platform

# Download and run installer
INSTALL_URL="https://raw.githubusercontent.com/yourusername/provisioning/main/install.sh"
curl -sSL "$INSTALL_URL" | bash

# Follow prompts to configure installation directory and path
# Default: ~/.local/bin/provisioning

Installer performs:

  • Downloads latest platform binaries
  • Installs CLI to system PATH
  • Creates default configuration structure
  • Validates dependencies
  • Runs health check

Option 2: Build from Source

# Clone repository
git clone  [https://github.com/yourusername/provisioning.git](https://github.com/yourusername/provisioning.git)
cd provisioning

# Build core CLI
cd provisioning/core
cargo build --release

# Install to local bin
cp target/release/provisioning ~/.local/bin/

# Add to PATH (add to ~/.bashrc or ~/.zshrc)
export PATH="$HOME/.local/bin:$PATH"

# Verify installation
provisioning version

Platform Health Check

# Verify installation
provisioning setup check

# Expected output:
# ✓ Nushell 0.109.1 installed
# ✓ Nickel 1.15.1 installed
# ✓ SOPS 3.10.2 installed
# ✓ Age 1.2.1 installed
# ✓ Provisioning CLI installed
# ✓ Configuration directory created
# Platform ready for use

Phase 2: Initial Configuration

Generate User Configuration

# Create user configuration directory
mkdir -p ~/.config/provisioning

# Generate default user config
provisioning setup init-user-config

Generated configuration structure:

~/.config/provisioning/
├── user_config.yaml      # User preferences and workspace registry
├── credentials/          # Provider credentials (encrypted)
├── age/                  # Age encryption keys
└── cache/                # CLI cache

Configure Encryption

# Generate Age key pair for secrets
age-keygen -o ~/.config/provisioning/age/provisioning.key

# Store public key
age-keygen -y ~/.config/provisioning/age/provisioning.key > ~/.config/provisioning/age/provisioning.pub

# Configure SOPS to use Age
cat > ~/.config/sops/config.yaml <<EOF
creation_rules:
  - path_regex: \.secret\.(yam| l tom| l json)$
    age: $(cat ~/.config/provisioning/age/provisioning.pub)
EOF

Provider Credentials

Configure credentials for your chosen cloud provider.

UpCloud Configuration

# Edit user config
nano ~/.config/provisioning/user_config.yaml

# Add provider credentials
cat >> ~/.config/provisioning/user_config.yaml <<EOF
providers:
  upcloud:
    username: "your-upcloud-username"
    password_env: "UPCLOUD_PASSWORD"  # Read from environment variable
    default_zone: "de-fra1"
EOF

# Set environment variable (add to ~/.bashrc or ~/.zshrc)
export UPCLOUD_PASSWORD="your-upcloud-password"

AWS Configuration

# Add AWS credentials to user config
cat >> ~/.config/provisioning/user_config.yaml <<EOF
providers:
  aws:
    access_key_id_env: "AWS_ACCESS_KEY_ID"
    secret_access_key_env: "AWS_SECRET_ACCESS_KEY"
    default_region: "eu-west-1"
EOF

# Set environment variables
export AWS_ACCESS_KEY_ID="your-access-key-id"
export AWS_SECRET_ACCESS_KEY="your-secret-access-key"

Local Provider (Development)

# Configure local provider for testing
cat >> ~/.config/provisioning/user_config.yaml <<EOF
providers:
  local:
    backend: "docker"  # or "podman", "libvirt"
    storage_path: "$HOME/.local/share/provisioning/local"
EOF

# Ensure Docker is running
docker info

Validate Configuration

# Validate user configuration
provisioning validate config

# Test provider connectivity
provisioning providers

# Expected output:
# PROVIDER    STATUS     REGION/ZONE
# upcloud     connected  de-fra1
# local       ready      localhost

Phase 3: Create First Workspace

Initialize Workspace

# Create workspace for first project
provisioning workspace init my-first-project

# Navigate to workspace
cd workspace_my_first_project

# Verify structure
ls -la

Workspace structure created:

workspace_my_first_project/
├── infra/                   # Infrastructure definitions (Nickel)
├── config/                  # Workspace configuration
│   ├── provisioning.yaml    # Workspace metadata
│   ├── dev-defaults.toml    # Development defaults
│   ├── test-defaults.toml   # Testing defaults
│   └── prod-defaults.toml   # Production defaults
├── extensions/              # Workspace-specific extensions
│   ├── providers/
│   ├── taskservs/
│   └── workflows/
└── runtime/                 # State and logs (gitignored)
    ├── state/
    ├── checkpoints/
    └── logs/

Configure Workspace

# Edit workspace metadata
nano config/provisioning.yaml

Example workspace configuration:

workspace:
  name: my-first-project
  description: Learning Provisioning platform
  environment: development
  created: 2026-01-16T10:00:00Z

defaults:
  provider: local
  region: localhost
  confirmation_required: false

versioning:
  nushell: "0.109.1"
  nickel: "1.15.1"
  kubernetes: "1.29.0"

Phase 4: Define Infrastructure

Simple Server Configuration

Create your first infrastructure definition using Nickel:

# Create server definition
cat > infra/simple-server.ncl <<'EOF'
{
  metadata = {
    name = "simple-server"
    provider = "local"
    environment = 'development
  }

  infrastructure = {
    servers = [
      {
        name = "dev-web-01"
        plan = "small"
        zone = "localhost"
        disk_size_gb = 25
        backup_enabled = false
        role = 'standalone
      }
    ]
  }

  services = {
    taskservs = ["containerd"]
  }
}
EOF

Validate Infrastructure Schema

# Type-check Nickel schema
nickel typecheck infra/simple-server.ncl

# Validate against platform contracts
provisioning validate config --infra simple-server

# Preview deployment
provisioning server create --check --infra simple-server

Expected output:

Infrastructure Plan: simple-server
Provider: local
Environment: development

Servers to create:
  - dev-web-01 (small, standalone)
    Disk: 25 GB
    Backup: disabled

Task services:
  - containerd

Estimated resources:
  CPU: 1 core
  RAM: 1 GB
  Disk: 25 GB

Validation: PASSED

Deploy Infrastructure

# Create server
provisioning server create --infra simple-server --yes

# Monitor deployment
provisioning server status dev-web-01

Deployment progress:

Creating server: dev-web-01...
  [████████████████████████] 100% - Container created
  [████████████████████████] 100% - Network configured
  [████████████████████████] 100% - SSH ready

Server dev-web-01 created successfully
IP Address: 172.17.0.2
Status: running
Provider: local (docker)

Install Task Service

# Install containerd
provisioning taskserv create containerd --infra simple-server

# Verify installation
provisioning taskserv status containerd

Installation output:

Installing containerd on dev-web-01...
  [████████████████████████] 100% - Dependencies resolved
  [████████████████████████] 100% - Containerd installed
  [████████████████████████] 100% - Service started
  [████████████████████████] 100% - Health check passed

Containerd installed successfully
Version: 1.7.0
Runtime: runc

Verify Deployment

# SSH into server
provisioning server ssh dev-web-01

# Inside server - verify containerd
sudo systemctl status containerd
sudo ctr version

# Exit server
exit

# List all resources
provisioning server list
provisioning taskserv list

Phase 5: Kubernetes Cluster Deployment

Define Kubernetes Infrastructure

# Create Kubernetes cluster definition
cat > infra/k8s-cluster.ncl <<'EOF'
{
  metadata = {
    name = "k8s-dev-cluster"
    provider = "local"
    environment = 'development
  }

  infrastructure = {
    servers = [
      {
        name = "k8s-control-01"
        plan = "medium"
        role = 'control
        zone = "localhost"
        disk_size_gb = 50
      }
      {
        name = "k8s-worker-01"
        plan = "medium"
        role = 'worker
        zone = "localhost"
        disk_size_gb = 50
      }
      {
        name = "k8s-worker-02"
        plan = "medium"
        role = 'worker
        zone = "localhost"
        disk_size_gb = 50
      }
    ]
  }

  services = {
    taskservs = ["containerd", "etcd", "kubernetes", "cilium"]
  }

  kubernetes = {
    version = "1.29.0"
    pod_cidr = "10.244.0.0/16"
    service_cidr = "10.96.0.0/12"
    container_runtime = "containerd"
    cri_socket = "/run/containerd/containerd.sock"
  }
}
EOF

Validate Kubernetes Configuration

# Type-check schema
nickel typecheck infra/k8s-cluster.ncl

# Validate configuration
provisioning validate config --infra k8s-cluster

# Preview deployment
provisioning cluster create --check --infra k8s-cluster

Deploy Kubernetes Cluster

# Create cluster infrastructure
provisioning cluster create --infra k8s-cluster --yes

# Monitor cluster deployment
provisioning cluster status k8s-dev-cluster

Cluster deployment phases:

Phase 1: Creating servers...
  [████████████████████████] 100% - 3/3 servers created

Phase 2: Installing containerd...
  [████████████████████████] 100% - 3/3 nodes ready

Phase 3: Installing etcd...
  [████████████████████████] 100% - Control plane ready

Phase 4: Installing Kubernetes...
  [████████████████████████] 100% - API server available
  [████████████████████████] 100% - Workers joined

Phase 5: Installing Cilium CNI...
  [████████████████████████] 100% - Network ready

Kubernetes cluster deployed successfully
Cluster: k8s-dev-cluster
Control plane: k8s-control-01
Workers: k8s-worker-01, k8s-worker-02

Access Kubernetes Cluster

# Get kubeconfig
provisioning cluster kubeconfig k8s-dev-cluster > ~/.kube/config-dev

# Set KUBECONFIG
export KUBECONFIG=~/.kube/config-dev

# Verify cluster
kubectl get nodes

# Expected output:
# NAME              STATUS   ROLES           AGE   VERSION
# k8s-control-01    Ready    control-plane   5m    v1.29.0
# k8s-worker-01     Ready    <none>          4m    v1.29.0
# k8s-worker-02     Ready    <none>          4m    v1.29.0

# Use K9s for interactive management
k9s

Phase 6: Security Configuration

Enable Audit Logging

# Configure audit logging
cat > config/audit-config.toml <<EOF
[audit]
enabled = true
log_path = "runtime/logs/audit"
retention_days = 90
level = "info"

[audit.filters]
include_commands = ["server create", "server delete", "cluster deploy"]
exclude_users = []
EOF

Configure SOPS for Secrets

# Create secrets file
cat > config/secrets.secret.yaml <<EOF
database:
  password: "changeme-db-password"
  admin_user: "admin"

kubernetes:
  service_account_key: "changeme-sa-key"
EOF

# Encrypt secrets with SOPS
sops -e -i config/secrets.secret.yaml

# Verify encryption
cat config/secrets.secret.yaml  # Should show encrypted content

# Decrypt when needed
sops -d config/secrets.secret.yaml

Enable MFA (Optional)

# Enable multi-factor authentication
provisioning security mfa enable

# Scan QR code with authenticator app
# Enter verification code

Configure RBAC

# Create role definition
cat > config/rbac-roles.yaml <<EOF
roles:
  - name: developer
    permissions:
      - server:read
      - server:create
      - taskserv:read
      - taskserv:install
    deny:
      - cluster:delete
      - config:modify

  - name: operator
    permissions:
      - "*:read"
      - server:*
      - taskserv:*
      - cluster:read
      - cluster:deploy

  - name: admin
    permissions:
      - "*:*"
EOF

Phase 7: Multi-Cloud Deployment

Define Multi-Cloud Infrastructure

# Create multi-cloud definition
cat > infra/multi-cloud.ncl <<'EOF'
{
  batch_workflow = {
    operations = [
      {
        id = "upcloud-frontend"
        provider = "upcloud"
        region = "de-fra1"
        servers = [
          {name = "upcloud-web-01", plan = "medium", role = 'web}
        ]
        taskservs = ["containerd", "nginx"]
      }
      {
        id = "aws-backend"
        provider = "aws"
        region = "eu-west-1"
        servers = [
          {name = "aws-api-01", plan = "t3.medium", role = 'api}
        ]
        taskservs = ["containerd", "docker"]
        dependencies = ["upcloud-frontend"]
      }
      {
        id = "local-database"
        provider = "local"
        region = "localhost"
        servers = [
          {name = "local-db-01", plan = "large", role = 'database}
        ]
        taskservs = ["postgresql"]
      }
    ]
    parallel_limit = 2
  }
}
EOF

Deploy Multi-Cloud Infrastructure

# Submit batch workflow
provisioning batch submit infra/multi-cloud.ncl

# Monitor workflow progress
provisioning batch status

# View detailed operation status
provisioning batch operations

Phase 8: Monitoring and Maintenance

Platform Health Monitoring

# Check platform health
provisioning health

# View service status
provisioning service status orchestrator
provisioning service status control-center

# View logs
provisioning logs --service orchestrator --tail 100

Infrastructure Monitoring

# List all servers
provisioning server list --all-workspaces

# Show server details
provisioning server info k8s-control-01

# Check task service status
provisioning taskserv list
provisioning taskserv health containerd

Backup Configuration

# Create backup
provisioning backup create --type full --output ~/backups/provisioning-$(date +%Y%m%d).tar.gz

# Schedule automatic backups
provisioning backup schedule daily --time "02:00" --retention 7

Phase 9: Advanced Workflows

Custom Workflow Creation

# Create custom workflow
cat > extensions/workflows/deploy-app.ncl <<'EOF'
{
  workflow = {
    name = "deploy-application"
    description = "Deploy application to Kubernetes"

    steps = [
      {
        name = "build-image"
        action = "docker-build"
        params = {dockerfile = "Dockerfile", tag = "myapp:latest"}
      }
      {
        name = "push-image"
        action = "docker-push"
        params = {image = "myapp:latest", registry = "registry.example.com"}
        depends_on = ["build-image"]
      }
      {
        name = "deploy-k8s"
        action = "kubectl-apply"
        params = {manifest = "k8s/deployment.yaml"}
        depends_on = ["push-image"]
      }
      {
        name = "verify-deployment"
        action = "kubectl-rollout-status"
        params = {deployment = "myapp"}
        depends_on = ["deploy-k8s"]
      }
    ]
  }
}
EOF

Execute Custom Workflow

# Run workflow
provisioning workflow run deploy-application

# Monitor workflow
provisioning workflow status deploy-application

# View workflow history
provisioning workflow history

Troubleshooting

Common Issues

Server Creation Fails

# Enable debug logging
provisioning --debug server create --infra simple-server

# Check provider connectivity
provisioning providers

# Validate credentials
provisioning validate config

Task Service Installation Fails

# Check server connectivity
provisioning server ssh dev-web-01

# Verify dependencies
provisioning taskserv check-deps containerd

# Retry installation
provisioning taskserv create containerd --force

Cluster Deployment Fails

# Check cluster status
provisioning cluster status k8s-dev-cluster

# View cluster logs
provisioning cluster logs k8s-dev-cluster

# Reset and retry
provisioning cluster reset k8s-dev-cluster
provisioning cluster create --infra k8s-cluster

Next Steps

Production Deployment

Advanced Features

Learning Resources

Summary

You’ve completed the from-scratch guide and learned:

  • Platform installation and configuration
  • Provider credential setup
  • Workspace creation and management
  • Infrastructure definition with Nickel
  • Server and task service deployment
  • Kubernetes cluster deployment
  • Security configuration
  • Multi-cloud deployment
  • Monitoring and maintenance
  • Custom workflow creation

Your Provisioning platform is now ready for production use.

Workspace Management

Multi-Cloud Deployment

Comprehensive guide to deploying and managing infrastructure across multiple cloud providers using the Provisioning platform. This guide covers strategies, patterns, and real-world examples for building resilient multi-cloud architectures.

Overview

Multi-cloud deployment enables:

  • Vendor independence - Avoid lock-in to single cloud provider
  • Geographic distribution - Deploy closer to users worldwide
  • Resilience - Survive provider outages or regional failures
  • Cost optimization - Leverage competitive pricing across providers
  • Compliance - Meet data residency and sovereignty requirements
  • Performance - Optimize latency through strategic placement

Multi-Cloud Strategies

Strategy 1: Primary-Backup Architecture

One provider serves production traffic, another provides disaster recovery.

Use cases:

  • Cost-conscious deployments
  • Regulatory backup requirements
  • Testing multi-cloud capabilities

Example topology:

Primary (UpCloud EU)          Backup (AWS US)
├── Production workloads      ├── Standby replicas
├── Active databases          ├── Read-only databases
├── Live traffic              └── Failover ready
└── Real-time sync ────────────>

Pros: Simple management, lower costs, proven failover Cons: Backup resources underutilized, sync lag

Strategy 2: Active-Active Architecture

Multiple providers serve production traffic simultaneously.

Use cases:

  • High availability requirements
  • Global user base
  • Zero-downtime deployments

Example topology:

UpCloud (EU)                  AWS (US)                      Local (Development)
├── EU traffic                ├── US traffic                ├── Testing
├── Primary database          ├── Primary database          ├── CI/CD
└── Global load balancer ←────┴──────────────────────────────┘

Pros: Maximum availability, optimized latency, full utilization Cons: Complex management, higher costs, data consistency challenges

Strategy 3: Specialized Workload Distribution

Different providers for different workload types based on strengths.

Use cases:

  • Heterogeneous workloads
  • Cost optimization
  • Leveraging provider-specific services

Example topology:

UpCloud                       AWS                           Local
├── Compute-intensive         ├── Object storage (S3)       ├── Development
├── Kubernetes clusters       ├── Managed databases (RDS)   └── Testing
└── High-performance VMs      └── Serverless (Lambda)

Pros: Optimize for provider strengths, cost-effective, flexible Cons: Complex integration, vendor-specific knowledge required

Strategy 4: Compliance-Driven Architecture

Provider selection based on regulatory and data residency requirements.

Use cases:

  • GDPR compliance
  • Data sovereignty
  • Industry regulations (HIPAA, PCI-DSS)

Example topology:

UpCloud (EU - GDPR)           AWS (US - FedRAMP)            On-Premises (Sensitive)
├── EU customer data          ├── US customer data          ├── PII storage
├── GDPR-compliant            ├── US compliance             └── Encrypted backups
└── Regional processing       └── Federal workloads

Pros: Meets compliance requirements, data sovereignty Cons: Geographic constraints, complex data management

Infrastructure Definition

Multi-Provider Server Configuration

Define servers across multiple providers using Nickel:

# infra/multi-cloud-servers.ncl
{
  metadata = {
    name = "multi-cloud-infrastructure"
    environment = 'production
  }

  infrastructure = {
    servers = [
      # UpCloud servers (EU region)
      {
        name = "upcloud-web-01"
        provider = "upcloud"
        zone = "de-fra1"
        plan = "medium"
        role = 'web
        backup_enabled = true
        tags = ["frontend", "europe"]
      }
      {
        name = "upcloud-web-02"
        provider = "upcloud"
        zone = "fi-hel1"
        plan = "medium"
        role = 'web
        backup_enabled = true
        tags = ["frontend", "europe"]
      }

      # AWS servers (US region)
      {
        name = "aws-api-01"
        provider = "aws"
        zone = "us-east-1a"
        plan = "t3.large"
        role = 'api
        backup_enabled = true
        tags = ["backend", "americas"]
      }
      {
        name = "aws-api-02"
        provider = "aws"
        zone = "us-west-2a"
        plan = "t3.large"
        role = 'api
        backup_enabled = true
        tags = ["backend", "americas"]
      }

      # Local provider (development/testing)
      {
        name = "local-test-01"
        provider = "local"
        zone = "localhost"
        plan = "small"
        role = 'test
        backup_enabled = false
        tags = ["testing", "development"]
      }
    ]
  }

  networking = {
    vpn_mesh = true
    cross_provider_routing = true
    dns_strategy = 'geo_distributed
  }
}

Batch Workflow for Multi-Cloud

Use batch workflows for orchestrated multi-cloud deployments:

# infra/multi-cloud-batch.ncl
{
  batch_workflow = {
    name = "global-deployment"
    description = "Deploy infrastructure across three cloud providers"

    operations = [
      {
        id = "upcloud-eu"
        provider = "upcloud"
        region = "de-fra1"
        servers = [
          {name = "upcloud-web-01", plan = "medium", role = 'web}
          {name = "upcloud-db-01", plan = "large", role = 'database}
        ]
        taskservs = ["containerd", "nginx", "postgresql"]
        priority = 1
      }

      {
        id = "aws-us"
        provider = "aws"
        region = "us-east-1"
        servers = [
          {name = "aws-api-01", plan = "t3.large", role = 'api}
          {name = "aws-cache-01", plan = "t3.medium", role = 'cache}
        ]
        taskservs = ["containerd", "docker", "redis"]
        dependencies = ["upcloud-eu"]
        priority = 2
      }

      {
        id = "local-dev"
        provider = "local"
        region = "localhost"
        servers = [
          {name = "local-test-01", plan = "small", role = 'test}
        ]
        taskservs = ["containerd"]
        priority = 3
      }
    ]

    execution = {
      parallel_limit = 2
      retry_failed = true
      max_retries = 3
      checkpoint_enabled = true
    }
  }
}

Deployment Patterns

Pattern 1: Sequential Deployment

Deploy providers one at a time to minimize risk.

# Deploy to primary provider first
provisioning batch submit infra/upcloud-primary.ncl

# Verify primary deployment
provisioning server list --provider upcloud
provisioning server status upcloud-web-01

# Deploy to secondary provider
provisioning batch submit infra/aws-secondary.ncl

# Verify secondary deployment
provisioning server list --provider aws

Advantages:

  • Controlled rollout
  • Easy troubleshooting
  • Clear rollback path

Disadvantages:

  • Slower deployment
  • Sequential dependencies

Pattern 2: Parallel Deployment

Deploy to multiple providers simultaneously for speed.

# Submit multi-cloud batch workflow
provisioning batch submit infra/multi-cloud-batch.ncl

# Monitor all operations
provisioning batch status

# Check progress per provider
provisioning batch operations --filter provider=upcloud
provisioning batch operations --filter provider=aws

Advantages:

  • Fast deployment
  • Efficient resource usage
  • Parallel testing

Disadvantages:

  • Complex failure handling
  • Resource contention
  • Harder troubleshooting

Pattern 3: Blue-Green Multi-Cloud

Deploy new infrastructure in parallel, then switch traffic.

# infra/blue-green-multi-cloud.ncl
{
  deployment = {
    strategy = 'blue_green

    blue_environment = {
      upcloud = {servers = [{name = "upcloud-web-01-blue", role = 'web}]}
      aws = {servers = [{name = "aws-api-01-blue", role = 'api}]}
    }

    green_environment = {
      upcloud = {servers = [{name = "upcloud-web-01-green", role = 'web}]}
      aws = {servers = [{name = "aws-api-01-green", role = 'api}]}
    }

    traffic_switch = {
      type = 'dns
      validation_required = true
      rollback_timeout_seconds = 300
    }
  }
}
# Deploy green environment
provisioning deployment create --infra blue-green-multi-cloud --target green

# Validate green environment
provisioning deployment validate green

# Switch traffic to green
provisioning deployment switch-traffic green

# Decommission blue environment
provisioning deployment delete blue

Network Configuration

Cross-Provider VPN Mesh

Connect servers across providers using VPN mesh:

# infra/vpn-mesh.ncl
{
  networking = {
    vpn_mesh = {
      enabled = true
      encryption = 'wireguard

      peers = [
        {
          name = "upcloud-gateway"
          provider = "upcloud"
          public_ip = "auto"
          private_subnet = "10.0.1.0/24"
        }
        {
          name = "aws-gateway"
          provider = "aws"
          public_ip = "auto"
          private_subnet = "10.0.2.0/24"
        }
        {
          name = "local-gateway"
          provider = "local"
          public_ip = "192.168.1.1"
          private_subnet = "10.0.3.0/24"
        }
      ]

      routing = {
        dynamic_routes = true
        bgp_enabled = false
        static_routes = [
          {from = "10.0.1.0/24", to = "10.0.2.0/24", via = "aws-gateway"}
          {from = "10.0.2.0/24", to = "10.0.1.0/24", via = "upcloud-gateway"}
        ]
      }
    }
  }
}

Global DNS Configuration

Configure geo-distributed DNS for optimal routing:

# infra/global-dns.ncl
{
  dns = {
    provider = 'cloudflare  # or 'route53, 'custom

    zones = [
      {
        name = "example.com"
        type = 'primary

        records = [
          {
            name = "eu"
            type = 'A
            ttl = 300
            values = ["upcloud-web-01.ip", "upcloud-web-02.ip"]
            geo_location = 'europe
          }
          {
            name = "us"
            type = 'A
            ttl = 300
            values = ["aws-api-01.ip", "aws-api-02.ip"]
            geo_location = 'americas
          }
          {
            name = "@"
            type = 'CNAME
            ttl = 60
            value = "global-lb.example.com"
            geo_routing = 'latency_based
          }
        ]
      }
    ]

    health_checks = [
      {target = "upcloud-web-01", interval_seconds = 30}
      {target = "aws-api-01", interval_seconds = 30}
    ]
  }
}

Data Replication

Database Replication Across Providers

Configure cross-provider database replication:

# infra/database-replication.ncl
{
  databases = {
    postgresql = {
      primary = {
        provider = "upcloud"
        server = "upcloud-db-01"
        version = "15"
        replication_role = 'primary
      }

      replicas = [
        {
          provider = "aws"
          server = "aws-db-replica-01"
          version = "15"
          replication_role = 'replica
          replication_lag_max_seconds = 30
          failover_priority = 1
        }
        {
          provider = "local"
          server = "local-db-backup-01"
          version = "15"
          replication_role = 'replica
          replication_lag_max_seconds = 300
          failover_priority = 2
        }
      ]

      replication = {
        method = 'streaming
        ssl_required = true
        compression = true
        conflict_resolution = 'primary_wins
      }
    }
  }
}

Object Storage Sync

Synchronize object storage across providers:

# Configure cross-provider storage sync
cat > infra/storage-sync.ncl <<'EOF'
{
  storage = {
    sync_policy = {
      source = {
        provider = "upcloud"
        bucket = "primary-storage"
        region = "de-fra1"
      }

      destinations = [
        {
          provider = "aws"
          bucket = "backup-storage"
          region = "us-east-1"
          sync_interval_minutes = 15
        }
      ]

      filters = {
        include_patterns = ["*.pdf", "*.jpg", "backups/*"]
        exclude_patterns = ["temp/*", "*.tmp"]
      }

      conflict_resolution = 'timestamp_wins
    }
  }
}
EOF

Kubernetes Multi-Cloud

Cluster Federation

Deploy Kubernetes clusters across providers with federation:

# infra/k8s-federation.ncl
{
  kubernetes_federation = {
    clusters = [
      {
        name = "upcloud-eu-cluster"
        provider = "upcloud"
        region = "de-fra1"
        control_plane_count = 3
        worker_count = 5
        version = "1.29.0"
      }
      {
        name = "aws-us-cluster"
        provider = "aws"
        region = "us-east-1"
        control_plane_count = 3
        worker_count = 5
        version = "1.29.0"
      }
    ]

    federation = {
      enabled = true
      control_plane_cluster = "upcloud-eu-cluster"

      networking = {
        cluster_mesh = true
        service_discovery = 'dns
        cross_cluster_load_balancing = true
      }

      workload_distribution = {
        strategy = 'geo_aware
        prefer_local = true
        failover_enabled = true
      }
    }
  }
}

Multi-Cluster Deployments

Deploy applications across multiple Kubernetes clusters:

# k8s/multi-cluster-deployment.yaml
apiVersion: v1
kind: Namespace
metadata:
  name: multi-cloud-app
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: web-frontend
  namespace: multi-cloud-app
  labels:
    app: frontend
    region: europe
spec:
  replicas: 3
  selector:
    matchLabels:
      app: frontend
  template:
    metadata:
      labels:
        app: frontend
    spec:
      containers:
      - name: nginx
        image: nginx:latest
        ports:
        - containerPort: 80
# Deploy to multiple clusters
export UPCLOUD_KUBECONFIG=~/.kube/config-upcloud
export AWS_KUBECONFIG=~/.kube/config-aws

kubectl --kubeconfig $UPCLOUD_KUBECONFIG apply -f k8s/multi-cluster-deployment.yaml
kubectl --kubeconfig $AWS_KUBECONFIG apply -f k8s/multi-cluster-deployment.yaml

# Verify deployments
kubectl --kubeconfig $UPCLOUD_KUBECONFIG get pods -n multi-cloud-app
kubectl --kubeconfig $AWS_KUBECONFIG get pods -n multi-cloud-app

Cost Optimization

Provider Selection by Workload

Optimize costs by choosing the most cost-effective provider per workload:

# infra/cost-optimized.ncl
{
  cost_optimization = {
    workloads = [
      {
        name = "compute-intensive"
        provider = "upcloud"  # Best compute pricing
        plan = "large"
        count = 10
      }
      {
        name = "storage-heavy"
        provider = "aws"  # Best storage pricing with S3
        plan = "medium"
        count = 5
        storage_type = 's3
      }
      {
        name = "development"
        provider = "local"  # Zero cost
        plan = "small"
        count = 3
      }
    ]

    budget_limits = {
      monthly_max_usd = 5000
      alerts = [
        {threshold_percent = 75, notify = "[ops-team@example.com](mailto:ops-team@example.com)"}
        {threshold_percent = 90, notify = "[finance@example.com](mailto:finance@example.com)"}
      ]
    }
  }
}

Reserved Instance Strategy

Leverage reserved instances for predictable workloads:

# Configure reserved instances
cat > infra/reserved-instances.ncl <<'EOF'
{
  reserved_instances = {
    upcloud = {
      commitment = 'yearly
      instances = [
        {plan = "medium", count = 5}
        {plan = "large", count = 2}
      ]
    }

    aws = {
      commitment = 'yearly
      instances = [
        {type = "t3.large", count = 3}
        {type = "t3.xlarge", count = 1}
      ]
      savings_plan = true
    }
  }
}
EOF

Monitoring Multi-Cloud

Centralized Monitoring

Deploy unified monitoring across providers:

# infra/monitoring.ncl
{
  monitoring = {
    prometheus = {
      enabled = true
      federation = true

      instances = [
        {provider = "upcloud", region = "de-fra1"}
        {provider = "aws", region = "us-east-1"}
      ]

      scrape_configs = [
        {
          job_name = "upcloud-nodes"
          static_configs = [{targets = ["upcloud-*.internal:9100"]}]
        }
        {
          job_name = "aws-nodes"
          static_configs = [{targets = ["aws-*.internal:9100"]}]
        }
      ]

      remote_write = {
        url = " [https://central-prometheus.example.com/api/v1/write"](https://central-prometheus.example.com/api/v1/write")
        compression = true
      }
    }

    grafana = {
      enabled = true
      dashboards = ["multi-cloud-overview", "per-provider", "cost-analysis"]
      alerts = ["high-latency", "provider-down", "budget-exceeded"]
    }
  }
}

Disaster Recovery

Cross-Provider Failover

Configure automatic failover between providers:

# infra/disaster-recovery.ncl
{
  disaster_recovery = {
    primary_provider = "upcloud"
    secondary_provider = "aws"

    failover_triggers = [
      {condition = 'provider_unavailable, action = 'switch_to_secondary}
      {condition = 'health_check_failed, threshold = 3, action = 'switch_to_secondary}
      {condition = 'latency_exceeded, threshold_ms = 1000, action = 'switch_to_secondary}
    ]

    failover_process = {
      dns_ttl_seconds = 60
      health_check_interval_seconds = 10
      automatic = true
      notification_channels = ["email", "slack"]
    }

    backup_strategy = {
      frequency = 'daily
      retention_days = 30
      cross_region = true
      cross_provider = true
    }
  }
}

Best Practices

Configuration Management

  • Use Nickel for all infrastructure definitions
  • Version control all configuration files
  • Use workspace per environment (dev/staging/prod)
  • Implement configuration validation before deployment
  • Maintain provider abstraction where possible

Security

  • Encrypt cross-provider communication (VPN, TLS)
  • Use separate credentials per provider
  • Implement RBAC consistently across providers
  • Enable audit logging on all providers
  • Encrypt data at rest and in transit

Deployment

  • Test in single-provider environment first
  • Use batch workflows for complex multi-cloud deployments
  • Enable checkpoints for long-running deployments
  • Implement progressive rollout strategies
  • Maintain rollback procedures

Monitoring

  • Centralize logs and metrics
  • Monitor cross-provider network latency
  • Track costs per provider
  • Alert on provider-specific failures
  • Measure failover readiness

Cost Management

  • Regular cost audits per provider
  • Use reserved instances for predictable loads
  • Implement budget alerts
  • Optimize data transfer costs
  • Consider spot instances for non-critical workloads

Troubleshooting

Provider Connectivity Issues

# Test provider connectivity
provisioning providers

# Test specific provider
provisioning provider test upcloud
provisioning provider test aws

# Debug network connectivity
provisioning network test --from upcloud-web-01 --to aws-api-01

Cross-Provider Communication Failures

# Check VPN mesh status
provisioning network vpn-status

# Test cross-provider routes
provisioning network trace-route --from upcloud-web-01 --to aws-api-01

# Verify firewall rules
provisioning network firewall-check --provider upcloud
provisioning network firewall-check --provider aws

Data Replication Lag

# Check replication status
provisioning database replication-status postgresql

# Force replication sync
provisioning database sync --source upcloud-db-01 --target aws-db-replica-01

# View replication lag metrics
provisioning database metrics --metric replication_lag

See Also

Custom Extensions

Create custom providers, task services, and clusters to extend the Provisioning platform for your specific infrastructure needs.

Overview

Extensions allow you to:

  • Add support for new cloud providers
  • Create custom task services for specialized software
  • Define cluster templates for common deployment patterns
  • Integrate with proprietary infrastructure

Extension Types

Providers

Cloud or infrastructure backend integrations.

Use Cases: Custom private cloud, bare metal provisioning, proprietary APIs

Task Services

Installable software components.

Use Cases: Internal applications, specialized databases, custom monitoring

Clusters

Coordinated service groups.

Use Cases: Standard deployment patterns, application stacks, reference architectures

Creating a Custom Provider

Directory Structure

provisioning/extensions/providers/my-provider/
├── provider.ncl          # Provider schema
├── resources/
│   ├── server.nu        # Server operations
│   ├── network.nu       # Network operations
│   └── storage.nu       # Storage operations
└── README.md

Provider Schema (provider.ncl)

{
  name = "my-provider",
  description = "Custom infrastructure provider",

  config_schema = {
    api_endpoint | String,
    api_key | String,
    region | String | default = "default",
    timeout_seconds | Number | default = 300,
  },

  capabilities = {
    servers = true,
    networks = true,
    storage = true,
    load_balancers = false,
  }
}

Server Operations (resources/server.nu)

# Create server
export def "server create" [
  name: string
  plan: string
  --zone: string = "default"
] {
  let config = $env.PROVIDER_CONFIG | from json

  # Call provider API
  http post $"($config.api_endpoint)/servers" {
    name: $name,
    plan: $plan,
    zone: $zone
  } | from json
}

# Delete server
export def "server delete" [name: string] {
  let config = $env.PROVIDER_CONFIG | from json
  http delete $"($config.api_endpoint)/servers/($name)"
}

# List servers
export def "server list" [] {
  let config = $env.PROVIDER_CONFIG | from json
  http get $"($config.api_endpoint)/servers" | from json
}

Creating a Custom Task Service

Directory Structure

provisioning/extensions/taskservs/my-service/
├── service.ncl           # Service schema
├── install.nu            # Installation script
├── configure.nu          # Configuration script
├── health-check.nu       # Health validation
└── README.md

Service Schema (service.ncl)

{
  name = "my-service",
  version = "1.0.0",
  description = "Custom service deployment",

  dependencies = ["kubernetes"],

  config_schema = {
    replicas | Number | default = 3,
    port | Number | default = 8080,
    storage_size_gb | Number | default = 10,
    image | String,
  }
}

Installation Script (install.nu)

export def "taskserv install" [config: record] {
  print $"Installing ($config.name)..."

  # Create namespace
  kubectl create namespace $config.name

  # Deploy application
  kubectl apply -f - <<EOF
apiVersion: apps/v1
kind: Deployment
metadata:
  name: ($config.name)
  namespace: ($config.name)
spec:
  replicas: ($config.replicas)
  template:
    spec:
      containers:
      - name: app
        image: ($config.image)
        ports:
        - containerPort: ($config.port)
EOF

  {status: "installed"}
}

Health Check (health-check.nu)

export def "taskserv health" [name: string] {
  let pods = (kubectl get pods -n $name -o json | from json)

  let ready = ($pods.items | all | { | p $p.status.phase == "Running"})

  if $ready {
    {status: "healthy", ready_pods: ($pods.items | length)}
  } else {
    {status: "unhealthy", reason: "pods not running"}
  }
}

Creating a Custom Cluster

Directory Structure

provisioning/extensions/clusters/my-cluster/
├── cluster.ncl           # Cluster definition
├── deploy.nu             # Deployment script
└── README.md

Cluster Schema (cluster.ncl)

{
  name = "my-cluster",
  version = "1.0.0",
  description = "Custom application stack",

  components = {
    servers = [
      {name = "app", count = 3, plan = 'medium},
      {name = "db", count = 1, plan = 'large},
    ],
    services = ["nginx", "postgresql", "redis"],
  },

  config_schema = {
    domain | String,
    app_replicas | Number | default = 3,
    db_storage_gb | Number | default = 100,
  }
}

Testing Extensions

Local Testing

# Test provider operations
provisioning provider test my-provider --local

# Test task service installation
provisioning taskserv install my-service --dry-run

# Validate cluster definition
provisioning cluster validate my-cluster

Integration Testing

# Create test workspace
provisioning workspace create test-extensions

# Deploy extension
provisioning extension deploy my-provider

# Test deployment
provisioning server create test-server --provider my-provider

Extension Best Practices

  1. Define clear schemas - Use Nickel contracts for type safety
  2. Implement health checks - Validate service state
  3. Handle errors gracefully - Return structured error messages
  4. Document configuration - Provide clear examples
  5. Version extensions - Track compatibility
  6. Test thoroughly - Unit and integration tests

Publishing Extensions

Extension Registry

Share extensions with the community:

# Package extension
provisioning extension package my-provider

# Publish to registry
provisioning extension publish my-provider --registry community

Private Registry

Host internal extensions:

# Configure private registry
provisioning config set extension_registry  [https://registry.internal](https://registry.internal)

# Publish privately
provisioning extension publish my-provider --private

Examples

Custom Database Provider

Provider for proprietary database platform:

{
  name = "mydb-provider",
  capabilities = {databases = true},
  config_schema = {
    cluster_endpoint | String,
    admin_token | String,
  }
}

Monitoring Stack Service

Complete monitoring deployment:

{
  name = "monitoring-stack",
  dependencies = ["prometheus", "grafana", "loki"],
  config_schema = {
    retention_days | Number | default = 30,
    alert_email | String,
  }
}

Troubleshooting

Extension Not Loading

# Verify extension structure
provisioning extension validate my-extension

# Check logs
provisioning logs extension-loader --tail 100

Deployment Failures

# Enable debug logging
export PROVISIONING_LOG_LEVEL=debug
provisioning taskserv install my-service

# Check service logs
provisioning taskserv logs my-service

References

Disaster Recovery

Comprehensive disaster recovery procedures for the Provisioning platform and managed infrastructure.

Overview

Disaster recovery (DR) ensures business continuity through:

  • Automated backups
  • Point-in-time recovery
  • Multi-region failover
  • Data replication
  • DR testing procedures

Recovery Objectives

RTO (Recovery Time Objective)

Target time to restore service:

  • Critical Services: < 1 hour
  • Production Infrastructure: < 4 hours
  • Development Environment: < 24 hours

RPO (Recovery Point Objective)

Maximum acceptable data loss:

  • Production Databases: < 5 minutes (continuous replication)
  • Configuration: < 1 hour (hourly backups)
  • Workspace State: < 15 minutes (incremental backups)

Backup Strategy

Automated Backups

Configure automatic backups:

{
  backup = {
    enabled = true,
    schedule = "0 */6 * * *",  # Every 6 hours
    retention_days = 30,

    targets = [
      {type = 'workspace_state, enabled = true},
      {type = 'infrastructure_config, enabled = true},
      {type = 'platform_data, enabled = true},
    ],

    storage = {
      backend = 's3,
      bucket = "provisioning-backups",
      encryption = true,
    }
  }
}

Backup Types

Full Backups:

# Full platform backup
provisioning backup create --type full --name "pre-upgrade-$(date +%Y%m%d)"

# Full workspace backup
provisioning workspace backup production --full

Incremental Backups:

# Incremental backup (changed files only)
provisioning backup create --type incremental

# Automated incremental
provisioning config set backup.incremental_enabled true

Snapshot Backups:

# Infrastructure snapshot
provisioning infrastructure snapshot --name "stable-v2"

# Database snapshot
provisioning taskserv backup postgresql --snapshot

Data Replication

Cross-Region Replication

Replicate to secondary region:

{
  replication = {
    enabled = true,
    mode = 'async,

    primary = {region = "eu-west-1", provider = 'aws},
    secondary = {region = "us-east-1", provider = 'aws},

    replication_lag_max_seconds = 300,
  }
}

Database Replication

# Configure database replication
provisioning taskserv configure postgresql --replication \
  --primary db-eu-west-1 \
  --standby db-us-east-1 \
  --sync-mode async

Disaster Scenarios

Complete Region Failure

Procedure:

  1. Detect Failure:
# Check region health
provisioning health check --region eu-west-1
  1. Initiate Failover:
# Promote secondary region
provisioning disaster-recovery failover --to us-east-1 --confirm

# Verify services
provisioning health check --all
  1. Update DNS:
# Point traffic to secondary region
provisioning dns update --region us-east-1
  1. Monitor:
# Watch recovery progress
provisioning disaster-recovery status --follow

Data Corruption

Procedure:

  1. Identify Corruption:
# Validate data integrity
provisioning validate data --workspace production
  1. Find Clean Backup:
# List available backups
provisioning backup list --before "2024-01-15 10:00"

# Verify backup integrity
provisioning backup verify backup-20240115-0900
  1. Restore from Backup:
# Restore to point in time
provisioning restore --backup backup-20240115-0900 \
  --workspace production --confirm

Platform Service Failure

Procedure:

  1. Identify Failed Service:
# Check platform health
provisioning platform health

# Service logs
provisioning platform logs orchestrator --tail 100
  1. Restart Service:
# Restart failed service
provisioning platform restart orchestrator

# Verify health
provisioning platform health orchestrator
  1. Restore from Backup (if needed):
# Restore service data
provisioning platform restore orchestrator \
  --from-backup latest

Failover Procedures

Automated Failover

Configure automatic failover:

{
  failover = {
    enabled = true,
    health_check_interval_seconds = 30,
    failure_threshold = 3,

    primary = {region = "eu-west-1"},
    secondary = {region = "us-east-1"},

    auto_failback = false,  # Manual failback
  }
}

Manual Failover

# Initiate manual failover
provisioning disaster-recovery failover \
  --from eu-west-1 \
  --to us-east-1 \
  --verify-replication \
  --confirm

# Verify failover
provisioning disaster-recovery verify

# Update routing
provisioning disaster-recovery update-routing

Recovery Procedures

Workspace Recovery

# List workspace backups
provisioning workspace backups production

# Restore workspace
provisioning workspace restore production \
  --backup backup-20240115-1200 \
  --target-region us-east-1

# Verify recovery
provisioning workspace validate production

Infrastructure Recovery

# Restore infrastructure from Nickel config
provisioning infrastructure restore \
  --config workspace/infra/production.ncl \
  --region us-east-1

# Restore from snapshot
provisioning infrastructure restore \
  --snapshot infra-snapshot-20240115

# Verify deployment
provisioning infrastructure validate

Platform Recovery

# Reinstall platform services
provisioning platform install --region us-east-1

# Restore platform data
provisioning platform restore --from-backup latest

# Verify platform health
provisioning platform health --all

DR Testing

Test Schedule

  • Monthly: Backup restore test
  • Quarterly: Regional failover drill
  • Annually: Full DR simulation

Backup Restore Test

# Create test workspace
provisioning workspace create dr-test-$(date +%Y%m%d)

# Restore latest backup
provisioning workspace restore dr-test --backup latest

# Validate restore
provisioning workspace validate dr-test

# Cleanup
provisioning workspace delete dr-test --yes

Failover Drill

# Simulate regional failure
provisioning disaster-recovery simulate-failure \
  --region eu-west-1 \
  --duration 30m

# Monitor automated failover
provisioning disaster-recovery status --follow

# Validate services in secondary region
provisioning health check --region us-east-1 --all

# Manual failback after drill
provisioning disaster-recovery failback --to eu-west-1

Monitoring and Alerts

Backup Monitoring

# Check backup status
provisioning backup status

# Verify backup integrity
provisioning backup verify --all --schedule daily

# Alert on backup failures
provisioning alert create backup-failure \
  --condition "backup.status == 'failed'" \
  --notify [ops@example.com](mailto:ops@example.com)

Replication Monitoring

# Check replication lag
provisioning replication status

# Alert on lag exceeding threshold
provisioning alert create replication-lag \
  --condition "replication.lag_seconds > 300" \
  --notify [ops@example.com](mailto:ops@example.com)

Best Practices

  1. Regular testing - Test DR procedures quarterly
  2. Automated backups - Never rely on manual backups
  3. Multiple regions - Geographic redundancy
  4. Monitor replication - Track replication lag
  5. Document procedures - Keep runbooks updated
  6. Encrypt backups - Protect backup data
  7. Verify restores - Test backup integrity
  8. Automate failover - Reduce recovery time

References

Provisioning Logo

Provisioning

Infrastructure as Code

Define and manage infrastructure using Nickel, the type-safe configuration language that serves as Provisioning’s source of truth.

Overview

Provisioning’s infrastructure definition system provides:

  • Type-safe configuration via Nickel language with mandatory schema validation and contract enforcement
  • Complete provider support for AWS, UpCloud, Hetzner, Kubernetes, on-premise, and custom platforms
  • 50+ task services for specialized infrastructure operations (databases, monitoring, logging, networking)
  • Pre-built clusters for common patterns (web, OCI registry, cache, distributed computing)
  • Batch workflows with DAG scheduling, parallel execution, and multi-cloud orchestration
  • Schema validation with inheritance, merging, and contracts ensuring correctness
  • Configuration composition with includes, profiles, and environment-specific overrides
  • Version management with semantic versioning and deprecation paths

All infrastructure is defined in Nickel (never TOML) ensuring compile-time correctness and runtime safety.

Infrastructure Configuration Guides

Core Configuration

  • Nickel Guide - Syntax, types, contracts, lazy evaluation, record merging, patterns, best practices for IaC

Nickel Validation Flow Type Checking Contract Validation

  • Configuration System - Hierarchical loading, environment variables, profiles, composition, inheritance, validation

Configuration Loading Hierarchy Priority CLI Env User Workspace System

  • Schemas Reference - Contracts, types, validation rules, inheritance, composition, custom schema development

Resources and Operations

  • Providers Guide - AWS, UpCloud, Hetzner, Kubernetes, on-premise, demo with capabilities, resources, examples

  • Task Services Guide - 50+ services: databases, monitoring, logging, networking, CI/CD, storage

  • Clusters Guide - Web cluster (3-tier), OCI registry, cache cluster, distributed computing, Kubernetes operators

  • Batch Workflows - DAG-based scheduling, parallel execution, logic, error handling, multi-cloud, state management

Batch Workflow DAG Execution Parallel Tasks Dependencies

Advanced Topics

Workspace Hierarchy Structure Config Infra Schemas Extensions

  • Version Management - Semantic versioning, dependency resolution, compatibility, deprecation, upgrade workflows

  • Performance Optimization - Configuration caching, lazy evaluation, parallel validation, incremental updates

Nickel as Source of Truth

Critical principle: Nickel is the source of truth for ALL infrastructure definitions.

  • Nickel: Type-safe, validated, enforced, source of truth
  • TOML: Generated output only, never hand-edited
  • JSON/YAML: Generated output only, never source definitions
  • KCL: Deprecated, completely replaced by Nickel

This ensures:

  1. Compile-time validation - Errors caught before deployment
  2. Schema enforcement - All configurations conform to contracts
  3. Type safety - No runtime configuration errors
  4. IDE support - Type hints and autocompletion via schema
  5. Evolution - Breaking changes detected and reported

Configuration Hierarchy

Configurations load in order of precedence:

1. Command-line arguments       (highest priority)
2. Environment variables        (PROVISIONING_*)
3. User configuration          (~/.config/provisioning/user.nickel)
4. Workspace configuration     (workspace/config/main.nickel)
5. Infrastructure schemas      (provisioning/schemas/)
6. System defaults            (provisioning/config/defaults.toml)
                               (lowest priority)

Quick Start Paths

I’m new to Nickel

Start with Nickel Guide - language syntax, type system, functions, patterns with infrastructure examples.

I need to define infrastructure

Read Configuration System - how configurations load, compose, and validate.

I want to use AWS/UpCloud/Hetzner

See Providers Guide - capabilities, resources, configuration examples for each cloud.

I need databases, monitoring, logging

Check Task Services Guide - 50+ services with configuration examples.

I want to deploy web applications

Review Clusters Guide - pre-built 3-tier web cluster, load balancer, database, caching.

I need multi-cloud workflows

Learn Batch Workflows - DAG scheduling across multiple providers.

I need multi-tenant setup

Study Multi-Tenancy Patterns - isolation, billing, resource management.

Example Nickel Configuration

{
  extensions = {
    providers = [
      {
        name = "aws",
        version = "1.2.3",
        enabled = true,
        config = {
          region = "us-east-1",
          credentials_source = "aws_iam"
        }
      }
    ]
  },

  infrastructure = {
    networks = [
      {
        name = "main",
        provider = "aws",
        cidr = "10.0.0.0/16",
        subnets = [
          { cidr = "10.0.1.0/24", availability_zone = "us-east-1a" },
          { cidr = "10.0.2.0/24", availability_zone = "us-east-1b" }
        ]
      }
    ],

    instances = [
      {
        name = "web-server-1",
        provider = "aws",
        instance_type = "t3.large",
        image = "ubuntu-22.04",
        network = "main",
        subnet = "10.0.1.0/24"
      }
    ]
  }
}

Schema Contracts

All infrastructure must conform to schemas. Schemas define:

  • Required fields - Must be provided
  • Type constraints - Values must match type
  • Field contracts - Custom validation logic
  • Defaults - Applied automatically
  • Documentation - Inline help and examples

Validation and Testing

Before deploying:

  1. Schema validation - provisioning validate config
  2. Syntax checking - provisioning validate syntax
  3. Policy checks - Custom policy validation
  4. Unit tests - Test configuration logic
  5. Integration tests - Dry-run with actual providers
  • Provisioning Schemas → See provisioning/schemas/ in codebase
  • Configuration Examples → See provisioning/docs/src/examples/
  • Provider Examples → See provisioning/docs/src/examples/aws-deployment-examples.md
  • Task Services → See provisioning/extensions/ in codebase
  • API Reference → See provisioning/docs/src/api-reference/

Nickel Guide

Comprehensive guide to using Nickel as the infrastructure-as-code language for the Provisioning platform.

Critical Principle: Nickel is Source of Truth

TYPE-SAFETY ALWAYS REQUIRED: ALL configurations MUST be type-safe and validated via Nickel. TOML is NOT acceptable as source of truth. Validation is NOT optional, NOT “progressive”, NOT “production-only”. This applies to ALL profiles (developer, production, cicd).

Nickel is the PRIMARY IaC language. TOML files are GENERATED OUTPUT ONLY, never the source.

Why Nickel

Nickel provides:

  • Type Safety: Static type checking catches errors before deployment
  • Lazy Evaluation: Efficient configuration composition and merging
  • Contract System: Schema validation with gradual typing
  • Record Merging: Powerful composition without duplication
  • LSP Support: IDE integration for autocomplete and validation
  • Human-Readable: Clear syntax for infrastructure definition

Installation

# macOS (Homebrew)
brew install nickel

# Linux (Cargo)
cargo install nickel-lang-cli

# Verify installation
nickel --version  # 1.15.1+

Core Concepts

Records and Fields

Records are the fundamental data structure in Nickel:

{
  name = "my-server"
  plan = "medium"
  zone = "de-fra1"
}

Type Annotations

Add type safety with contracts:

{
  name : String = "my-server"
  plan : String = "medium"
  cpu_count : Number = 4
  enabled : Bool = true
}

Record Merging

Compose configurations by merging records:

let base_config = {
  provider = "upcloud"
  region = "de-fra1"
} in

let server_config = base_config & {
  name = "web-01"
  plan = "medium"
} in

server_config

Result:

{
  provider = "upcloud"
  region = "de-fra1"
  name = "web-01"
  plan = "medium"
}

Contracts (Schema Validation)

Define contracts to validate structure:

let ServerContract = {
  name | String
  plan | String | default = "small"
  zone | String | default = "de-fra1"
  cpu | Number | optional
} in

{
  name = "my-server"
  plan = "large"
} | ServerContract

Three-File Pattern (Provisioning Standard)

The platform uses a standardized three-file pattern for all schemas:

1. contracts.ncl - Type Definitions

Defines the schema contracts:

# contracts.ncl
{
  Server = {
    name | String
    plan | String | default = "small"
    zone | String | default = "de-fra1"
    disk_size_gb | Number | default = 25
    backup_enabled | Bool | default = false
    role | | [ 'control, 'worker, 'standalone | ] | optional
  }

  Infrastructure = {
    servers | Array Server
    provider | String
    environment | | [ 'development, 'staging, 'production | ]
  }
}

2. defaults.ncl - Default Values

Provides sensible defaults:

# defaults.ncl
{
  server = {
    name = "unnamed-server"
    plan = "small"
    zone = "de-fra1"
    disk_size_gb = 25
    backup_enabled = false
  }

  infrastructure = {
    servers = []
    provider = "local"
    environment = 'development
  }
}

3. main.ncl - Entry Point

Combines contracts and defaults, provides makers:

# main.ncl
let contracts_lib = import "./contracts.ncl" in
let defaults_lib = import "./defaults.ncl" in

{
  # Direct access to defaults (for inspection)
  defaults = defaults_lib

  # Convenience makers (90% of use cases)
  make_server | not_exported = fun overrides =>
    defaults_lib.server & overrides

  make_infrastructure | not_exported = fun overrides =>
    defaults_lib.infrastructure & overrides

  # Default instances (bare defaults)
  DefaultServer = defaults_lib.server
  DefaultInfrastructure = defaults_lib.infrastructure
}

Usage Example

# user-infra.ncl
let infra_lib = import "provisioning/schemas/infrastructure/main.ncl" in

infra_lib.make_infrastructure {
  provider = "upcloud"
  environment = 'production
  servers = [
    infra_lib.make_server {
      name = "web-01"
      plan = "medium"
      backup_enabled = true
    }
    infra_lib.make_server {
      name = "web-02"
      plan = "medium"
      backup_enabled = true
    }
  ]
}

Hybrid Interface Pattern

Records can be used both as functions (makers) and as plain data:

let config_lib = import "./config.ncl" in

# Use as function (with overrides)
let custom_config = config_lib.make_server { name = "custom" } in

# Use as plain data (defaults)
let default_config = config_lib.DefaultServer in

{
  custom = custom_config
  default = default_config
}

Record Merging Strategies

Priority Merging (Default)

let base = { a = 1, b = 2 } in
let override = { b = 3, c = 4 } in
base & override
# Result: { a = 1, b = 3, c = 4 }

Recursive Merging

let base = {
  server = { cpu = 2, ram = 4 }
} in

let override = {
  server = { ram = 8, disk = 100 }
} in

std.record.merge_all [base, override]
# Result: { server = { cpu = 2, ram = 8, disk = 100 } }

Lazy Evaluation

Nickel evaluates expressions lazily, only when needed:

let expensive_computation = std.string.join " " ["a", "b", "c"] in

{
  # Only evaluated when accessed
  computed_field = expensive_computation

  # Conditional evaluation
  conditional = if environment == 'production then
    expensive_computation
  else
    "dev-value"
}

Schema Organization

The platform organizes Nickel schemas by domain:

provisioning/schemas/
├── main.ncl                  # Top-level entry point
├── config/                   # Configuration schemas
│   ├── settings/
│   │   ├── main.ncl
│   │   ├── contracts.ncl
│   │   └── defaults.ncl
│   └── defaults/
│       ├── main.ncl
│       ├── contracts.ncl
│       └── defaults.ncl
├── infrastructure/           # Infrastructure definitions
│   ├── servers/
│   ├── networks/
│   └── storage/
├── deployment/               # Deployment schemas
├── services/                 # Service configurations
├── operations/               # Operational schemas
└── generator/                # Runtime schema generation

Type System

Primitive Types

{
  string_field : String = "text"
  number_field : Number = 42
  bool_field : Bool = true
}

Array Types

{
  names : Array String = ["alice", "bob", "charlie"]
  ports : Array Number = [80, 443, 8080]
}

Enum Types

{
  environment : | [ 'development, 'staging, 'production | ] = 'production
  role : | [ 'control, 'worker, 'standalone | ] = 'worker
}

Optional Fields

{
  required_field : String = "value"
  optional_field : String | optional
}

Default Values

{
  with_default : String | default = "default-value"
}

Validation Patterns

Runtime Validation

let validate_plan = fun plan =>
  if plan == "small" | | plan == "medium" | | plan == "large" then
    plan
  else
    std.fail "Invalid plan: must be small, medium, or large"
in

{
  plan = validate_plan "medium"
}

Contract-Based Validation

let PlanContract = | [ 'small, 'medium, 'large | ] in

{
  plan | PlanContract = 'medium
}

Real-World Examples

Simple Server Configuration

{
  metadata = {
    name = "demo-server"
    provider = "upcloud"
    environment = 'development
  }

  infrastructure = {
    servers = [
      {
        name = "web-01"
        plan = "medium"
        zone = "de-fra1"
        disk_size_gb = 50
        backup_enabled = true
        role = 'standalone
      }
    ]
  }

  services = {
    taskservs = ["containerd", "docker"]
  }
}

Kubernetes Cluster Configuration

{
  metadata = {
    name = "k8s-prod"
    provider = "upcloud"
    environment = 'production
  }

  infrastructure = {
    servers = [
      {
        name = "k8s-control-01"
        plan = "medium"
        role = 'control
        zone = "de-fra1"
        disk_size_gb = 50
        backup_enabled = true
      }
      {
        name = "k8s-worker-01"
        plan = "large"
        role = 'worker
        zone = "de-fra1"
        disk_size_gb = 100
        backup_enabled = true
      }
      {
        name = "k8s-worker-02"
        plan = "large"
        role = 'worker
        zone = "de-fra1"
        disk_size_gb = 100
        backup_enabled = true
      }
    ]
  }

  services = {
    taskservs = ["containerd", "etcd", "kubernetes", "cilium", "rook-ceph"]
  }

  kubernetes = {
    version = "1.28.0"
    pod_cidr = "10.244.0.0/16"
    service_cidr = "10.96.0.0/12"
    container_runtime = "containerd"
    cri_socket = "/run/containerd/containerd.sock"
  }
}

Multi-Provider Batch Workflow

{
  batch_workflow = {
    operations = [
      {
        id = "aws-cluster"
        provider = "aws"
        region = "us-east-1"
        servers = [
          { name = "aws-web-01", plan = "t3.medium" }
        ]
      }
      {
        id = "upcloud-cluster"
        provider = "upcloud"
        region = "de-fra1"
        servers = [
          { name = "upcloud-web-01", plan = "medium" }
        ]
        dependencies = ["aws-cluster"]
      }
    ]
    parallel_limit = 2
  }
}

Validation Workflow

Type-Check Schema

# Check syntax and types
nickel typecheck infra/my-cluster.ncl

# Export to JSON (validates during export)
nickel export infra/my-cluster.ncl

# Export to TOML (generated output only)
nickel export --format toml infra/my-cluster.ncl > config.toml

Platform Validation

# Validate against platform contracts
provisioning validate config --infra my-cluster

# Verbose validation
provisioning validate config --verbose

IDE Integration

Language Server (nickel-lang-lsp)

Install LSP for IDE support:

# Install LSP server
cargo install nickel-lang-lsp

# Configure your editor (VS Code example)
# Install "Nickel" extension from marketplace

Features:

  • Syntax highlighting
  • Type checking on save
  • Autocomplete
  • Hover documentation
  • Go to definition

VS Code Configuration

{
  "nickel.lsp.command": "nickel-lang-lsp",
  "nickel.lsp.args": ["--stdio"],
  "nickel.format.onSave": true
}

Common Patterns

Environment-Specific Configuration

let env_configs = {
  development = {
    plan = "small"
    backup_enabled = false
  }
  production = {
    plan = "large"
    backup_enabled = true
  }
} in

let environment = 'production in

{
  servers = [
    env_configs.%{std.string.from_enum environment} & {
      name = "server-01"
    }
  ]
}

Configuration Composition

let base_server = {
  zone = "de-fra1"
  backup_enabled = false
} in

let prod_overrides = {
  backup_enabled = true
  disk_size_gb = 100
} in

{
  servers = [
    base_server & { name = "dev-01" }
    base_server & prod_overrides & { name = "prod-01" }
  ]
}

Migration from TOML

TOML is ONLY for generated output. Source is always Nickel.

# Generate TOML from Nickel (if needed for external tools)
nickel export --format toml infra/cluster.ncl > cluster.toml

# NEVER edit cluster.toml directly - edit cluster.ncl instead

Best Practices

  1. Use Three-File Pattern: Separate contracts, defaults, and main entry
  2. Type Everything: Add type annotations for all fields
  3. Validate Early: Run nickel typecheck before deployment
  4. Use Makers: Leverage maker functions for composition
  5. Document Contracts: Add comments explaining schema requirements
  6. Avoid Duplication: Use record merging and defaults
  7. Test Locally: Export and verify before deploying
  8. Version Schemas: Track schema changes in version control

Debugging

Type Errors

# Detailed type error messages
nickel typecheck --color always infra/cluster.ncl

Schema Inspection

# Export to JSON for inspection
nickel export infra/cluster.ncl | jq '.'

# Check specific field
nickel export infra/cluster.ncl | jq '.metadata'

Format Code

# Auto-format Nickel files
nickel fmt infra/cluster.ncl

# Check formatting without modifying
nickel fmt --check infra/cluster.ncl

Next Steps

Configuration System

The Provisioning platform uses a hierarchical configuration system with Nickel as the source of truth for infrastructure definitions and TOML/YAML for application settings.

Configuration Hierarchy

Configuration is loaded in order of precedence (highest to lowest):

1. Runtime Arguments    - CLI flags (--config, --workspace, etc.)
2. Environment Variables - PROVISIONING_* environment variables
3. User Configuration   - ~/.config/provisioning/user_config.yaml
4. Infrastructure Config - Nickel schemas in workspace/provisioning
5. System Defaults      - provisioning/config/config.defaults.toml

Later sources override earlier ones, allowing flexible configuration management across environments.

Configuration Files

System Defaults

Located at provisioning/config/config.defaults.toml:

[general]
log_level = "info"
workspace_root = "./workspaces"

[providers]
default_provider = "local"

[orchestrator]
max_parallel_tasks = 4
checkpoint_enabled = true

User Configuration

Located at ~/.config/provisioning/user_config.yaml:

general:
  preferred_editor: nvim
  default_workspace: production

providers:
  upcloud:
    default_zone: fi-hel1
  aws:
    default_region: eu-west-1

Workspace Configuration

Nickel-based infrastructure configuration in workspace directories:

workspace/
├── config/
│   ├── main.ncl           # Workspace configuration
│   ├── providers.ncl      # Provider definitions
│   └── variables.ncl      # Workspace variables
├── infra/
│   └── servers.ncl        # Infrastructure definitions
└── .workspace/
    └── metadata.toml      # Workspace metadata

Environment Variables

All configuration can be overridden via environment variables:

export PROVISIONING_LOG_LEVEL=debug
export PROVISIONING_WORKSPACE=production
export PROVISIONING_PROVIDER=upcloud
export PROVISIONING_DRY_RUN=true

Variable naming: PROVISIONING_<SECTION>_<KEY> (uppercase with underscores).

Configuration Accessors

The platform provides 476+ configuration accessors for programmatic access:

# Get configuration value
provisioning config get general.log_level

# Set configuration value (workspace-scoped)
provisioning config set providers.default_provider upcloud

# List all configuration
provisioning config list

# Validate configuration
provisioning config validate

Profiles

Configuration supports profiles for different environments:

[profiles.development]
log_level = "debug"
dry_run = true

[profiles.production]
log_level = "warn"
dry_run = false
checkpoint_enabled = true

Activate profile:

provisioning --profile production deploy

Inheritance and Overrides

Workspace configurations inherit from system defaults:

# workspace/config/main.ncl
let parent = import "../../provisioning/schemas/defaults.ncl" in
parent & {
  # Override specific values
  general.log_level = "debug",
  providers.default_provider = "aws",
}

Secrets Management

Sensitive configuration is encrypted using SOPS/Age:

# Encrypt configuration
sops --encrypt --age <public-key> secrets.yaml > secrets.enc.yaml

# Decrypt and use
provisioning deploy --secrets secrets.enc.yaml

Integration with SecretumVault for enterprise secrets management (see Secrets Management).

Configuration Validation

All Nickel-based configuration is validated before use:

# Validate workspace configuration
provisioning config validate

# Check schema compliance
nickel export --format json workspace/config/main.ncl

Type-safety is mandatory - invalid configuration is rejected at load time.

Best Practices

  1. Use Nickel for infrastructure - Type-safe, validated infrastructure definitions
  2. Use TOML for application settings - Simple key-value configuration
  3. Encrypt secrets - Never commit unencrypted credentials
  4. Document overrides - Comment why values differ from defaults
  5. Validate before deploy - Always run config validate before deployment
  6. Version control - Track configuration changes in Git
  7. Profile separation - Isolate development/staging/production configs

Troubleshooting

Configuration Not Loading

Check precedence order:

# Show effective configuration
provisioning config show --debug

# Trace configuration loading
PROVISIONING_LOG_LEVEL=trace provisioning config list

Schema Validation Failures

# Check Nickel syntax
nickel typecheck workspace/config/main.ncl

# Export and inspect
nickel export workspace/config/main.ncl

Environment Variable Issues

# List all PROVISIONING_* variables
env | grep PROVISIONING_

# Clear all provisioning env vars
unset $(env | grep PROVISIONING_ | cut -d= -f1 | xargs)

References

Schemas Reference

Provisioning uses Nickel schemas for type-safe infrastructure definitions. This reference documents the schema organization, structure, and usage patterns.

Schema Organization

Schemas are organized in provisioning/schemas/:

provisioning/schemas/
├── main.ncl                 # Root schema entry point
├── lib/
│   ├── contracts.ncl        # Type contracts and validators
│   ├── functions.ncl        # Helper functions
│   └── types.ncl            # Common type definitions
├── config/
│   ├── providers.ncl        # Provider configuration schemas
│   ├── settings.ncl         # Platform settings schemas
│   └── workspace.ncl        # Workspace configuration schemas
├── infrastructure/
│   ├── servers.ncl          # Server resource schemas
│   ├── networks.ncl         # Network resource schemas
│   └── storage.ncl          # Storage resource schemas
├── operations/
│   ├── deployment.ncl       # Deployment workflow schemas
│   └── lifecycle.ncl        # Resource lifecycle schemas
├── services/
│   ├── kubernetes.ncl       # Kubernetes schemas
│   └── databases.ncl        # Database schemas
└── integrations/
    ├── cloud_providers.ncl  # Cloud provider integrations
    └── external_services.ncl # External service integrations

Core Contracts

Server Contract

let Server = {
  name
    | doc "Server identifier (must be unique)"
    | String,

  plan
    | doc "Server size (small, medium, large, xlarge)"
    | | [ 'small, 'medium, 'large, 'xlarge | ],

  provider
    | doc "Cloud provider (upcloud, aws, local)"
    | | [ 'upcloud, 'aws, 'local | ],

  zone
    | doc "Availability zone"
    | String
    | optional,

  ip_address
    | doc "Public IP address"
    | String
    | optional,

  storage
    | doc "Storage configuration"
    | Array StorageConfig
    | default = [],

  metadata
    | doc "Custom metadata tags"
    | {_ : String}
    | default = {},
}

Network Contract

let Network = {
  name
    | doc "Network identifier"
    | String,

  cidr
    | doc "CIDR block (e.g., 10.0.0.0/16)"
    | String
    | std.string.is_match_regex "^([0-9]{1,3}\\.){3}[0-9]{1,3}/[0-9]{1,2}$",

  subnets
    | doc "Subnet definitions"
    | Array Subnet,

  routing
    | doc "Routing configuration"
    | RoutingConfig
    | optional,
}

Storage Contract

let StorageConfig = {
  size_gb
    | doc "Storage size in GB"
    | Number
    | std.number.greater 0,

  type
    | doc "Storage type"
    | | [ 'ssd, 'hdd, 'nvme | ],

  mount_point
    | doc "Mount path"
    | String
    | optional,

  encrypted
    | doc "Enable encryption"
    | Bool
    | default = false,
}

Workspace Schema

Workspace configuration schema:

let WorkspaceConfig = {
  name
    | doc "Workspace identifier"
    | String,

  environment
    | doc "Environment type"
    | | [ 'development, 'staging, 'production | ],

  providers
    | doc "Enabled providers"
    | Array | [ 'upcloud, 'aws, 'local | ]
    | default = ['local],

  infrastructure
    | doc "Infrastructure definitions"
    | {
        servers | Array Server | default = [],
        networks | Array Network | default = [],
        storage | Array StorageConfig | default = [],
      },

  settings
    | doc "Workspace-specific settings"
    | {_ : _}
    | default = {},
}

Provider Schemas

UpCloud Provider

let UpCloudConfig = {
  username
    | doc "UpCloud username"
    | String,

  password
    | doc "UpCloud password (encrypted)"
    | String,

  default_zone
    | doc "Default zone"
    | | [ 'fi-hel1, 'fi-hel2, 'de-fra1, 'uk-lon1, 'us-chi1, 'us-sjo1 | ]
    | default = 'fi-hel1,

  timeout_seconds
    | doc "API timeout"
    | Number
    | default = 300,
}

AWS Provider

let AWSConfig = {
  access_key_id
    | doc "AWS access key"
    | String,

  secret_access_key
    | doc "AWS secret key (encrypted)"
    | String,

  default_region
    | doc "Default AWS region"
    | String
    | default = "eu-west-1",

  assume_role_arn
    | doc "IAM role ARN"
    | String
    | optional,
}

Service Schemas

Kubernetes Schema

let KubernetesCluster = {
  name
    | doc "Cluster name"
    | String,

  version
    | doc "Kubernetes version"
    | String
    | std.string.is_match_regex "^v[0-9]+\\.[0-9]+\\.[0-9]+$",

  control_plane
    | doc "Control plane configuration"
    | {
        nodes | Number | std.number.greater 0,
        plan | | [ 'small, 'medium, 'large | ],
      },

  workers
    | doc "Worker node pools"
    | Array NodePool,

  networking
    | doc "Network configuration"
    | {
        pod_cidr | String,
        service_cidr | String,
        cni | | [ 'calico, 'cilium, 'flannel | ] | default = 'cilium,
      },

  addons
    | doc "Cluster addons"
    | Array | [ 'metrics-server, 'ingress-nginx, 'cert-manager | ]
    | default = [],
}

Validation Functions

Custom validation functions in lib/contracts.ncl:

let is_valid_hostname = fun name =>
  std.string.is_match_regex "^[a-z0-9]([-a-z0-9]*[a-z0-9])?$" name
in

let is_valid_port = fun port =>
  std.number.is_integer port && port >= 1 && port <= 65535
in

let is_valid_email = fun email =>
  std.string.is_match_regex "^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\\.[a-zA-Z]{2,}$" email
in

Merging and Composition

Schemas support composition through record merging:

let base_server = {
  plan = 'medium,
  provider = 'upcloud,
  storage = [],
}

let production_server = base_server & {
  plan = 'large,
  storage = [{size_gb = 100, type = 'ssd}],
}

Contract Enforcement

Type checking is enforced at load time:

# Typecheck schema
nickel typecheck provisioning/schemas/main.ncl

# Export with validation
nickel export --format json workspace/infra/servers.ncl

Invalid configurations are rejected before deployment.

Best Practices

  1. Define contracts first - Start with type contracts before implementation
  2. Use enums for choices - Leverage | [ 'option1, 'option2 | ] for fixed sets
  3. Document everything - Use |doc "description" annotations
  4. Validate early - Run nickel typecheck before deployment
  5. Compose, don’t duplicate - Use record merging for common patterns
  6. Version schemas - Track schema changes alongside infrastructure
  7. Test contracts - Validate edge cases and constraints

References

Providers

Providers are abstraction layers for interacting with cloud platforms and local infrastructure. Provisioning supports multiple providers through a unified interface.

Available Providers

UpCloud Provider

Production-ready cloud provider for European infrastructure.

Configuration:

{
  providers.upcloud = {
    username = "your-username",
    password = std.secret "UPCLOUD_PASSWORD",
    default_zone = 'fi-hel1,
    timeout_seconds = 300,
  }
}

Supported zones:

  • fi-hel1, fi-hel2 - Helsinki, Finland
  • de-fra1 - Frankfurt, Germany
  • uk-lon1 - London, UK
  • us-chi1 - Chicago, USA
  • us-sjo1 - San Jose, USA

Resources: Servers, networks, storage, firewalls, load balancers

AWS Provider

Amazon Web Services integration for global cloud infrastructure.

Configuration:

{
  providers.aws = {
    access_key_id = std.secret "AWS_ACCESS_KEY_ID",
    secret_access_key = std.secret "AWS_SECRET_ACCESS_KEY",
    default_region = "eu-west-1",
  }
}

Resources: EC2, VPCs, EBS, security groups, RDS, S3

Local Provider

Local infrastructure for development and testing.

Configuration:

{
  providers.local = {
    backend = 'libvirt,  # or 'docker, 'podman
    storage_pool = "/var/lib/libvirt/images",
  }
}

Backends: libvirt (KVM/QEMU), docker, podman

Multi-Cloud Deployments

Deploy infrastructure across multiple providers:

{
  servers = [
    {name = "web-frontend", provider = 'upcloud, zone = "fi-hel1", plan = 'medium},
    {name = "api-backend", provider = 'aws, zone = "eu-west-1a", plan = 't3.large},
  ]
}

Provider Abstraction

Abstract resource definitions work across providers:

let server_config = fun name provider => {
  name = name,
  provider = provider,
  plan = 'medium,  # Automatically translated per provider
  storage = [{size_gb = 50, type = 'ssd}],
}

Plan translation:

AbstractUpCloudAWSLocal
small1xCPU-1GBt3.micro1 vCPU
medium2xCPU-4GBt3.medium2 vCPU
large4xCPU-8GBt3.large4 vCPU
xlarge8xCPU-16GBt3.xlarge8 vCPU

Best Practices

  1. Use abstract plans - Avoid provider-specific instance types
  2. Encrypt credentials - Always use encrypted secrets for API keys
  3. Test locally first - Validate configurations with local provider
  4. Document provider choices - Comment why specific providers are used
  5. Monitor costs - Track cloud provider spending

References

Task Services

Task services are installable infrastructure components that provide specific functionality. Provisioning includes 30+ task services for databases, orchestration, monitoring, and more.

Categories

Kubernetes & Container Orchestration

kubernetes - Complete Kubernetes cluster deployment

  • Control plane setup
  • Worker node pools
  • CNI configuration (Calico, Cilium, Flannel)
  • Addon management (metrics-server, ingress-nginx, cert-manager)

containerd - Container runtime configuration

  • Systemd integration
  • Storage driver configuration
  • Runtime class support

docker - Docker engine installation

  • Docker Compose integration
  • Registry configuration

Databases

postgresql - PostgreSQL database server

  • Replication setup
  • Backup automation
  • Performance tuning

mysql - MySQL/MariaDB deployment

  • Cluster configuration
  • Backup strategies

mongodb - MongoDB database

  • Replica sets
  • Sharding configuration

redis - Redis in-memory store

  • Persistence configuration
  • Cluster mode

Storage

rook-ceph - Cloud-native storage orchestrator

  • Block storage (RBD)
  • Object storage (S3-compatible)
  • Shared filesystem (CephFS)

minio - S3-compatible object storage

  • Distributed mode
  • Versioning and lifecycle policies

Monitoring & Observability

prometheus - Metrics collection and alerting

  • Service discovery
  • Alerting rules
  • Long-term storage

grafana - Metrics visualization

  • Dashboard provisioning
  • Data source configuration

loki - Log aggregation system

  • Log collection
  • Query language

Networking

cilium - eBPF-based networking and security

  • Network policies
  • Load balancing
  • Service mesh capabilities

calico - Network policy engine

  • BGP networking
  • IP-in-IP tunneling

nginx - Web server and reverse proxy

  • Load balancing
  • TLS termination

Security

vault - Secrets management (HashiCorp Vault)

  • Secret storage
  • Dynamic secrets
  • Encryption as a service

cert-manager - TLS certificate automation

  • Let’s Encrypt integration
  • Certificate renewal

Task Service Definition

Task services are defined in provisioning/extensions/taskservs/:

taskservs/
└── kubernetes/
    ├── service.ncl           # Service schema
    ├── install.nu            # Installation script
    ├── configure.nu          # Configuration script
    ├── health-check.nu       # Health validation
    └── README.md

Using Task Services

Installation

{
  task_services = [
    {
      name = "kubernetes",
      version = "v1.28.0",
      config = {
        control_plane = {nodes = 3, plan = 'medium},
        workers = [{name = "pool-1", nodes = 3, plan = 'large}],
        networking = {cni = 'cilium},
      }
    },
    {
      name = "prometheus",
      version = "latest",
      config = {retention = "30d", storage_size_gb = 100}
    }
  ]
}

CLI Commands

# List available task services
provisioning taskserv list

# Show task service details
provisioning taskserv show kubernetes

# Install task service
provisioning taskserv install kubernetes

# Check task service health
provisioning taskserv health kubernetes

# Uninstall task service
provisioning taskserv uninstall kubernetes

Custom Task Services

Create custom task services:

provisioning/extensions/taskservs/my-service/
├── service.ncl           # Service definition
├── install.nu            # Installation logic
├── configure.nu          # Configuration logic
├── health-check.nu       # Health checks
└── README.md

service.ncl schema:

{
  name = "my-service",
  version = "1.0.0",
  description = "Custom service description",
  dependencies = ["kubernetes"],  # Optional dependencies
  config_schema = {
    port | Number | default = 8080,
    replicas | Number | default = 3,
  }
}

install.nu implementation:

export def "taskserv install" [config: record] {
  # Installation logic
  print $"Installing ($config.name)..."

  # Deploy resources
  kubectl apply -f deployment.yaml

  {status: "installed"}
}

Task Service Lifecycle

  1. Validation - Check dependencies and configuration
  2. Installation - Execute install script
  3. Configuration - Apply service configuration
  4. Health Check - Verify service is running
  5. Ready - Service available for use

Dependencies

Task services can declare dependencies:

{
  name = "grafana",
  dependencies = ["prometheus"],  # Installed first
}

Provisioning automatically resolves dependency order.

Health Checks

Each task service provides health validation:

export def "taskserv health" [] {
  let pods = (kubectl get pods -l app=my-service -o json | from json)

  if ($pods.items | all | { | p $p.status.phase == "Running"}) {
    {status: "healthy"}
  } else {
    {status: "unhealthy", reason: "pods not running"}
  }
}

Best Practices

  1. Define schemas - Use Nickel schemas for task service configuration
  2. Declare dependencies - Explicit dependency declaration
  3. Idempotent installs - Installation should be repeatable
  4. Health checks - Implement comprehensive health validation
  5. Version pinning - Specify exact versions for reproducibility
  6. Document configuration - Provide clear configuration examples

References

Clusters

Clusters are coordinated groups of services deployed together. Provisioning provides cluster definitions for common deployment patterns.

Available Clusters

Web Cluster

Production-ready web application deployment with load balancing, TLS, and monitoring.

Components:

  • Nginx load balancer
  • Application servers (configurable count)
  • PostgreSQL database
  • Redis cache
  • Prometheus monitoring
  • Let’s Encrypt TLS certificates

Configuration:

{
  clusters = [{
    name = "web-production",
    type = 'web,
    config = {
      app_servers = 3,
      load_balancer = {
        public_ip = true,
        tls_enabled = true,
        domain = "example.com"
      },
      database = {
        size = 'medium,
        replicas = 2,
        backup_enabled = true
      },
      cache = {
        size = 'small,
        persistence = true
      }
    }
  }]
}

OCI Registry Cluster

Private container registry with S3-compatible storage and authentication.

Components:

  • Harbor registry
  • MinIO object storage
  • PostgreSQL database
  • Redis cache
  • TLS termination

Configuration:

{
  clusters = [{
    name = "registry-private",
    type = 'oci_registry,
    config = {
      domain = "registry.example.com",
      storage = {
        backend = 'minio,
        size_gb = 500,
        replicas = 3
      },
      authentication = {
        method = 'ldap,  # or 'database, 'oidc
        admin_password = std.secret "REGISTRY_ADMIN_PASSWORD"
      }
    }
  }]
}

Kubernetes Cluster

Multi-node Kubernetes cluster with networking, storage, and monitoring.

Components:

  • Control plane nodes
  • Worker node pools
  • Cilium CNI
  • Rook-Ceph storage
  • Metrics server
  • Ingress controller

Configuration:

{
  clusters = [{
    name = "k8s-production",
    type = 'kubernetes,
    config = {
      control_plane = {
        nodes = 3,
        plan = 'medium,
        high_availability = true
      },
      node_pools = [
        {
          name = "general",
          nodes = 5,
          plan = 'large,
          labels = {workload = "general"}
        },
        {
          name = "gpu",
          nodes = 2,
          plan = 'xlarge,
          labels = {workload = "ml"}
        }
      ],
      networking = {
        cni = 'cilium,
        pod_cidr = "10.42.0.0/16",
        service_cidr = "10.43.0.0/16"
      },
      storage = {
        provider = 'rook-ceph,
        default_storage_class = "ceph-block"
      }
    }
  }]
}

Cluster Deployment

CLI Commands

# List available cluster types
provisioning cluster types

# Show cluster configuration template
provisioning cluster template web

# Deploy cluster
provisioning cluster deploy web-production

# Check cluster health
provisioning cluster health web-production

# Scale cluster
provisioning cluster scale web-production --app-servers 5

# Destroy cluster
provisioning cluster destroy web-production

Deployment Lifecycle

  1. Validation - Validate cluster configuration
  2. Infrastructure - Provision servers, networks, storage
  3. Services - Install and configure task services
  4. Integration - Connect services together
  5. Health Check - Verify cluster health
  6. Ready - Cluster operational

Cluster Orchestration

Clusters use dependency graphs for orchestration:

Web Cluster Dependency Graph:

servers ──┐
          ├──> database ──┐
networks ─┘               ├──> app_servers ──> load_balancer
                          │
                          ├──> cache ──────────┘
                          │
                          └──> monitoring

Services are deployed in dependency order with parallel execution where possible.

Custom Cluster Definitions

Create custom cluster types:

provisioning/extensions/clusters/
└── my-cluster/
    ├── cluster.ncl           # Cluster definition
    ├── deploy.nu             # Deployment script
    ├── health-check.nu       # Health validation
    └── README.md

cluster.ncl schema:

{
  name = "my-cluster",
  version = "1.0.0",
  description = "Custom cluster type",
  components = {
    servers = [{name = "app", count = 3, plan = 'medium}],
    services = ["nginx", "postgresql", "redis"],
  },
  config_schema = {
    domain | String,
    replicas | Number | default = 3,
  }
}

Cluster Management

Scaling

Scale cluster components:

# Scale application servers
provisioning cluster scale web-production --component app_servers --count 5

# Scale database replicas
provisioning cluster scale web-production --component database --replicas 3

Updates

Rolling updates without downtime:

# Update application version
provisioning cluster update web-production --app-version 2.0.0

# Update infrastructure (e.g., server plans)
provisioning cluster update web-production --plan large

Backup and Recovery

# Create cluster backup
provisioning cluster backup web-production

# Restore from backup
provisioning cluster restore web-production --backup 2024-01-15-snapshot

# List backups
provisioning cluster backups web-production

Monitoring

Cluster health monitoring:

# Overall cluster health
provisioning cluster health web-production

# Component health
provisioning cluster health web-production --component database

# Metrics
provisioning cluster metrics web-production

Health checks validate:

  • All services running
  • Network connectivity
  • Storage availability
  • Resource utilization

Best Practices

  1. Use predefined clusters - Leverage built-in cluster types
  2. Define dependencies - Explicit service dependencies
  3. Implement health checks - Comprehensive validation
  4. Plan for scaling - Design clusters for horizontal scaling
  5. Automate backups - Regular backup schedules
  6. Monitor resources - Track resource utilization
  7. Test disaster recovery - Validate backup/restore procedures

References

Batch Workflows

Batch workflows orchestrate complex multi-step operations across multiple clouds and services with dependency resolution, parallel execution, and checkpoint recovery.

Overview

Batch workflows enable:

  • Multi-cloud infrastructure orchestration
  • Complex deployment pipelines
  • Dependency-driven execution
  • Parallel task execution
  • Checkpoint and recovery
  • Rollback on failures

Workflow Definition

Workflows are defined in Nickel:

{
  workflows = [{
    name = "multi-cloud-deployment",
    description = "Deploy application across UpCloud and AWS",
    steps = [
      {
        name = "provision-upcloud",
        type = 'provision,
        provider = 'upcloud,
        resources = {
          servers = [{name = "web-eu", plan = 'medium, zone = "fi-hel1"}]
        }
      },
      {
        name = "provision-aws",
        type = 'provision,
        provider = 'aws,
        resources = {
          servers = [{name = "web-us", plan = 't3.medium, zone = "us-east-1a"}]
        }
      },
      {
        name = "deploy-application",
        type = 'task,
        depends_on = ["provision-upcloud", "provision-aws"],
        tasks = ["install-kubernetes", "deploy-app"]
      },
      {
        name = "configure-dns",
        type = 'configure,
        depends_on = ["deploy-application"],
        config = {
          records = [
            {name = "eu.example.com", target = "web-eu"},
            {name = "us.example.com", target = "web-us"}
          ]
        }
      }
    ],
    rollback_on_failure = true,
    checkpoint_enabled = true
  }]
}

Dependency Resolution

Workflows automatically resolve dependencies:

Execution Graph:

provision-upcloud ──┐
                    ├──> deploy-application ──> configure-dns
provision-aws ──────┘

Steps provision-upcloud and provision-aws run in parallel. deploy-application waits for both to complete.

Step Types

Provision Steps

Create infrastructure resources:

{
  name = "create-servers",
  type = 'provision,
  provider = 'upcloud,
  resources = {
    servers = [...],
    networks = [...],
    storage = [...]
  }
}

Task Steps

Execute task services:

{
  name = "install-k8s",
  type = 'task,
  tasks = ["kubernetes", "helm", "monitoring"]
}

Configure Steps

Apply configuration changes:

{
  name = "setup-networking",
  type = 'configure,
  config = {
    firewalls = [...],
    routes = [...],
    dns = [...]
  }
}

Validate Steps

Verify conditions before proceeding:

{
  name = "health-check",
  type = 'validate,
  checks = [
    {type = 'http, url = " [https://app.example.com",](https://app.example.com",) expected_status = 200},
    {type = 'command, command = "kubectl get nodes", expected_output = "Ready"}
  ]
}

Execution Control

Parallel Execution

Steps without dependencies run in parallel:

steps = [
  {name = "provision-eu", ...},  # Runs in parallel
  {name = "provision-us", ...},  # Runs in parallel
  {name = "provision-asia", ...} # Runs in parallel
]

Configure parallelism:

{
  max_parallel_tasks = 4,  # Max concurrent steps
  timeout_seconds = 3600   # Step timeout
}

Conditional Execution

Execute steps based on conditions:

{
  name = "scale-up",
  type = 'task,
  condition = {
    type = 'expression,
    expression = "cpu_usage > 80"
  }
}

Retry Logic

Automatically retry failed steps:

{
  name = "deploy-app",
  type = 'task,
  retry = {
    max_attempts = 3,
    backoff = 'exponential,  # or 'linear, 'constant
    initial_delay_seconds = 10
  }
}

Checkpoint and Recovery

Checkpointing

Workflows automatically checkpoint state:

# Enable checkpointing
provisioning workflow run multi-cloud --checkpoint

# Checkpoint saved at each step completion

Recovery

Resume from last successful checkpoint:

# Workflow failed at step 3
# Resume from checkpoint
provisioning workflow resume multi-cloud --from-checkpoint latest

# Resume from specific checkpoint
provisioning workflow resume multi-cloud --checkpoint-id abc123

Rollback

Automatic Rollback

Rollback on failure:

{
  rollback_on_failure = true,
  rollback_steps = [
    {name = "destroy-resources", type = 'destroy},
    {name = "restore-config", type = 'restore}
  ]
}

Manual Rollback

# Rollback to previous state
provisioning workflow rollback multi-cloud

# Rollback to specific checkpoint
provisioning workflow rollback multi-cloud --checkpoint-id abc123

Workflow Management

CLI Commands

# List workflows
provisioning workflow list

# Show workflow details
provisioning workflow show multi-cloud

# Run workflow
provisioning workflow run multi-cloud

# Check workflow status
provisioning workflow status multi-cloud

# View workflow logs
provisioning workflow logs multi-cloud

# Cancel running workflow
provisioning workflow cancel multi-cloud

Workflow State

Workflows track execution state:

  • pending - Not yet started
  • running - Currently executing
  • completed - Successfully finished
  • failed - Execution failed
  • rolling_back - Performing rollback
  • cancelled - Manually cancelled

Advanced Features

Dynamic Workflows

Generate workflows programmatically:

let regions = ["fi-hel1", "de-fra1", "uk-lon1"] in
{
  steps = std.array.map (fun region => {
    name = "provision-" ++ region,
    type = 'provision,
    resources = {servers = [{zone = region, ...}]}
  }) regions
}

Workflow Templates

Reusable workflow templates:

let DeploymentTemplate = fun app_name regions => {
  name = "deploy-" ++ app_name,
  steps = std.array.map (fun region => {
    name = "deploy-" ++ region,
    type = 'task,
    tasks = ["deploy-app"],
    config = {app_name = app_name, region = region}
  }) regions
}

# Use template
{
  workflows = [
    DeploymentTemplate "frontend" ["eu", "us"],
    DeploymentTemplate "backend" ["eu", "us", "asia"]
  ]
}

Notifications

Send notifications on workflow events:

{
  notifications = {
    on_success = {
      type = 'slack,
      webhook_url = std.secret "SLACK_WEBHOOK",
      message = "Deployment completed successfully"
    },
    on_failure = {
      type = 'email,
      to = ["[ops@example.com](mailto:ops@example.com)"],
      subject = "Workflow failed"
    }
  }
}

Best Practices

  1. Define dependencies explicitly - Clear dependency graph
  2. Enable checkpointing - Critical for long-running workflows
  3. Implement rollback - Always have rollback strategy
  4. Use validation steps - Verify state before proceeding
  5. Configure retries - Handle transient failures
  6. Monitor execution - Track workflow progress
  7. Test workflows - Validate with dry-run mode

Troubleshooting

Workflow Stuck

# Check workflow status
provisioning workflow status <workflow> --verbose

# View logs
provisioning workflow logs <workflow> --tail 100

# Cancel and restart
provisioning workflow cancel <workflow>
provisioning workflow run <workflow>

Step Failures

# View failed step details
provisioning workflow show <workflow> --step <step-name>

# Retry failed step
provisioning workflow retry <workflow> --step <step-name>

# Skip failed step
provisioning workflow skip <workflow> --step <step-name>

References

Version Management

Nickel-based version management for infrastructure components, providers, and task services ensures consistent, reproducible deployments.

Overview

Version management in Provisioning:

  • Nickel schemas define version constraints
  • Semantic versioning (semver) support
  • Version locking for reproducibility
  • Compatibility validation
  • Update strategies

Version Constraints

Define version requirements in Nickel:

{
  task_services = [
    {
      name = "kubernetes",
      version = ">=1.28.0, <1.30.0",  # Range constraint
    },
    {
      name = "prometheus",
      version = "~2.45.0",  # Patch versions allowed
    },
    {
      name = "grafana",
      version = "^10.0.0",  # Minor versions allowed
    },
    {
      name = "nginx",
      version = "1.25.3",  # Exact version
    }
  ]
}

Constraint Operators

OperatorMeaningExampleMatches
=Exact version=1.28.01.28.0 only
>=Greater or equal>=1.28.01.28.0, 1.29.0, 2.0.0
<=Less or equal<=1.30.01.28.0, 1.30.0
>Greater than>1.28.01.29.0, 2.0.0
<Less than<1.30.01.28.0, 1.29.0
~Patch updates~1.28.01.28.x
^Minor updates^1.28.01.x.x
,AND constraint>=1.28, <1.301.28.x, 1.29.x

Version Locking

Generate lock file for reproducible deployments:

# Generate lock file
provisioning version lock

# Creates versions.lock.ncl with exact versions

versions.lock.ncl:

{
  task_services = {
    kubernetes = "1.28.3",
    prometheus = "2.45.2",
    grafana = "10.0.5",
    nginx = "1.25.3"
  },
  providers = {
    upcloud = "1.2.0",
    aws = "3.5.1"
  }
}

Use lock file:

let locked = import "versions.lock.ncl" in
{
  task_services = [
    {name = "kubernetes", version = locked.task_services.kubernetes}
  ]
}

Version Updates

Check for Updates

# Check available updates
provisioning version check

# Show outdated components
provisioning version outdated

Output:

Component    Current  Latest   Update Available
kubernetes   1.28.0   1.29.2   Minor update
prometheus   2.45.0   2.47.0   Minor update
grafana      10.0.0   11.0.0   Major update (breaking)

Update Strategies

Conservative (patch only):

{
  update_policy = 'conservative,  # Only patch updates
}

Moderate (minor updates):

{
  update_policy = 'moderate,  # Patch + minor updates
}

Aggressive (all updates):

{
  update_policy = 'aggressive,  # All updates including major
}

Performing Updates

# Update all components (respecting constraints)
provisioning version update

# Update specific component
provisioning version update kubernetes

# Update to specific version
provisioning version update kubernetes --version 1.29.0

# Dry-run (show what would update)
provisioning version update --dry-run

Compatibility Validation

Validate version compatibility:

# Check compatibility
provisioning version validate

# Check specific component
provisioning version validate kubernetes

Compatibility rules defined in schemas:

{
  name = "grafana",
  version = "10.0.0",
  compatibility = {
    prometheus = ">=2.40.0",  # Requires Prometheus 2.40+
    kubernetes = ">=1.24.0"   # Requires Kubernetes 1.24+
  }
}

Version Resolution

When multiple constraints conflict, resolution strategy:

  1. Exact version - Highest priority
  2. Compatibility constraints - From dependencies
  3. User constraints - From configuration
  4. Latest compatible - Within constraints

Example resolution:

# Component A requires: kubernetes >=1.28.0
# Component B requires: kubernetes <1.30.0
# User specifies: kubernetes ^1.28.0

# Resolved: kubernetes 1.29.x (latest compatible)

Pinning Versions

Pin critical components:

{
  task_services = [
    {
      name = "kubernetes",
      version = "1.28.3",
      pinned = true  # Never auto-update
    }
  ]
}

Version Rollback

Rollback to previous versions:

# Show version history
provisioning version history

# Rollback to previous version
provisioning version rollback kubernetes

# Rollback to specific version
provisioning version rollback kubernetes --version 1.28.0

Best Practices

  1. Use version constraints - Avoid latest tag
  2. Lock versions - Generate and commit lock files
  3. Test updates - Validate in non-production first
  4. Pin critical components - Prevent unexpected updates
  5. Document compatibility - Specify version requirements
  6. Monitor updates - Track new releases
  7. Gradual rollout - Update incrementally

Version Metadata

Access version information programmatically:

# Show component versions
provisioning version list

# Export versions to JSON
provisioning version export --format json

# Compare versions
provisioning version compare <component> <version1> <version2>

Integration with CI/CD

# .gitlab-ci.yml example
deploy:
  script:
    - provisioning version lock --verify  # Verify lock file
    - provisioning version validate       # Check compatibility
    - provisioning deploy                 # Deploy with locked versions

Troubleshooting

Version Conflicts

# Show dependency tree
provisioning version tree

# Identify conflicting constraints
provisioning version conflicts

Update Failures

# Check why update failed
provisioning version update kubernetes --verbose

# Force update (override constraints)
provisioning version update kubernetes --force --version 1.30.0

References

Provisioning Logo

Provisioning

Platform Features

Complete documentation for the 12 core Provisioning platform capabilities enabling enterprise infrastructure as code across multiple clouds.

Overview

Provisioning provides comprehensive features for:

  • Workspace organization - Primary mode for grouping infrastructure, configs, schemas, and extensions with complete isolation
  • Intelligent CLI - Modular architecture with 80+ keyboard shortcuts, decentralized command registration, 84% code reduction
  • Type-safe configuration - Nickel as source of truth for all infrastructure definitions with mandatory validation
  • Batch operations - DAG scheduling, parallel execution, multi-cloud workflows with dependency resolution
  • Hybrid orchestration - Execute across Rust and Nushell with file-based persistence and atomic operations
  • Interactive guides - Step-by-step guided infrastructure deployment with validation and error recovery
  • Testing framework - Container-based test environments for validating infrastructure configurations
  • Platform installer - TUI and unattended installation with provider setup and configuration management
  • Security system - Complete v4.0.0 with authentication, authorization, encryption, secrets management, audit logging
  • Daemon acceleration - 50x performance improvement for script-heavy workloads via persistent Rust process
  • Intelligent detection - Automated analysis detecting cost, compliance, performance, security, and reliability issues
  • Extension registry - Central marketplace for providers, task services, plugins, and clusters with versioning

Feature Guides

Organization and Management

  • Workspace Management - Workspace mode, grouping, multi-tenancy, isolation, customization

  • CLI Architecture - Modular design, 80+ shortcuts, decentralized registration, dynamic subcommands, 84% code reduction

  • Configuration System - Nickel type-safe configuration, hierarchical loading, profiles, validation

Workflow and Operations

  • Batch Workflows - DAG scheduling, parallel execution, conditional logic, error handling, multi-cloud, dependency resolution

  • Orchestrator System - Hybrid Rust/Nushell, file-based persistence, atomic operations, event-driven

  • Provisioning Daemon - TCP service, 50x performance, connection pooling, LRU caching, graceful shutdown

Developer and Automation Features

  • Interactive Guides - Guided deployment, prompts, validation, error recovery, progress tracking

  • Test Environment - Container-based testing, sandbox isolation, validation, integration testing

  • Extension Registry - Marketplace for providers, task services, plugins, clusters, versioning, dependencies

Platform Capabilities

  • Platform Installer - TUI and unattended modes, provider setup, workspace creation, configuration management

  • Security System - v4.0.0: JWT/OAuth, Cedar RBAC, MFA, audit logging, encryption, secrets management

  • Detector System - Cost optimization, compliance, performance analysis, security detection, reliability assessment

  • Nushell Plugins - 17 plugins: tera, nickel, fluentd, secretumvault, 10-50x performance gains

  • Version Management - Semantic versioning, dependency resolution, compatibility, deprecation, upgrade workflows

Feature Categories

CategoryFeaturesUse Case
CoreWorkspace Management, CLI Architecture, Configuration SystemOrganization, command discovery, type-safety
OperationsBatch Workflows, Orchestrator, Version ManagementMulti-cloud, DAG scheduling, persistence
PerformanceProvisioning Daemon, Nushell PluginsScript acceleration, 10-50x speedup
Quality & TestingTest Environment, Extension RegistryConfiguration validation, distribution
Setup & InstallationPlatform InstallerInstallation, initial configuration
IntelligenceDetector SystemAnalysis, anomaly detection, cost optimization
SecuritySecurity System, Complete v4.0.0Authentication, authorization, encryption
User ExperienceInteractive GuidesGuided deployment, learning

Quick Navigation

I want to organize my infrastructure

Start with Workspace Management - primary organizational mode with isolation and customization.

I want faster command execution

Use Provisioning Daemon - 50x performance improvement for scripts through persistent process and caching.

I want to automate deployment

Learn Batch Workflows - DAG scheduling and multi-cloud orchestration with error handling.

I need to ensure security

Review Security System - complete authentication, authorization, encryption, audit logging.

I want to validate configurations

Check Test Environment - container-based sandbox testing and policy validation.

I need to extend capabilities

See Extension Registry - marketplace for providers, task services, plugins, clusters.

I need to find infrastructure issues

Use Detector System - automated cost, compliance, performance, and security analysis.

Integration with Platform

All features are integrated via:

  • CLI commands - Invoke from Nushell or bash
  • REST APIs - Integrate with external systems
  • Nushell scripting - Build custom automation
  • Nickel configuration - Type-safe definitions
  • Extensions - Add custom providers and services
  • Architecture Details → See provisioning/docs/src/architecture/
  • Development Guides → See provisioning/docs/src/development/
  • API Reference → See provisioning/docs/src/api-reference/
  • Operation Guides → See provisioning/docs/src/operations/
  • Security Details → See provisioning/docs/src/security/
  • Practical Examples → See provisioning/docs/src/examples/

Workspace Management

Workspaces are the default organizational unit for all infrastructure work in Provisioning. Every infrastructure project, deployment environment, or isolated configuration lives within a workspace. This workspace-first approach provides clean separation between projects, environments, and teams while enabling rapid context switching.

Overview

A workspace is an isolated environment that groups together:

  • Infrastructure definitions - Nickel schemas, server configs, cluster definitions
  • Configuration settings - Environment-specific settings, provider credentials, user preferences
  • Runtime data - State files, checkpoints, logs, generated configurations
  • Extensions - Custom providers, task services, workflow templates

The workspace system enforces that all infrastructure operations (server creation, task service installation, cluster deployment) require an active workspace. This prevents accidental cross-project modifications and ensures configuration isolation.

Why Workspace-First

Traditional infrastructure tools often mix configurations across projects, leading to:

  • Accidental deployments to wrong environments
  • Configuration drift between dev/staging/production
  • Credential leakage across projects
  • Difficulty tracking infrastructure boundaries

Provisioning’s workspace-first approach solves these problems by making workspace boundaries explicit and enforced at the CLI level.

Workspace Structure

Every workspace follows a consistent directory structure:

workspace_my_project/
├── infra/                    # Infrastructure definitions (Nickel schemas)
│   ├── my-cluster.ncl        # Cluster definition
│   ├── servers.ncl           # Server configurations
│   └── batch-workflows.ncl   # Batch workflow definitions
│
├── config/                   # Workspace configuration
│   ├── local-overrides.toml  # User-specific overrides (gitignored)
│   ├── dev-defaults.toml     # Development environment defaults
│   ├── test-defaults.toml    # Testing environment defaults
│   ├── prod-defaults.toml    # Production environment defaults
│   └── provisioning.yaml     # Workspace metadata and settings
│
├── extensions/               # Workspace-specific extensions
│   ├── providers/            # Custom cloud providers
│   ├── taskservs/            # Custom task services
│   ├── clusters/             # Custom cluster templates
│   └── workflows/            # Custom workflow definitions
│
└── runtime/                  # Runtime data (gitignored)
    ├── state/                # Infrastructure state files
    ├── checkpoints/          # Workflow checkpoints
    ├── logs/                 # Operation logs
    └── generated/            # Generated configuration files

Configuration Hierarchy

Workspace configurations follow a 5-layer hierarchy:

1. System Defaults       (provisioning/config/config.defaults.toml)
   ↓ overridden by
2. User Config           (~/.config/provisioning/user_config.yaml)
   ↓ overridden by
3. Workspace Config      (workspace/config/provisioning.yaml)
   ↓ overridden by
4. Environment Config    (workspace/config/{dev,test,prod}-defaults.toml)
   ↓ overridden by
5. Runtime Flags         (--flag value)

This hierarchy ensures sensible defaults while allowing granular control at every level.

Core Commands

Creating Workspaces

# Create new workspace
provisioning workspace init my-project

# Create workspace with specific location
provisioning workspace init my-project --path /custom/location

# Create from template
provisioning workspace init my-project --template kubernetes-ha

Listing Workspaces

# List all workspaces
provisioning workspace list

# Show active workspace
provisioning workspace status

# List with details
provisioning workspace list --verbose

Example output:

NAME              PATH                                 LAST_USED           STATUS
my-project        /workspaces/workspace_my_project     2026-01-15 10:30    Active
dev-env           /workspaces/workspace_dev_env        2026-01-14 15:45
production        /workspaces/workspace_production     2026-01-10 09:00

Switching Workspaces

# Switch to different workspace (single command)
provisioning workspace switch my-project

# Switch with validation
provisioning workspace switch production --validate

# Quick switch using shortcut
provisioning ws switch dev-env

Workspace switching updates:

  • Active workspace marker in user configuration
  • Environment variables for current session
  • CLI prompt indicator (if configured)
  • Last-used timestamp

Deleting Workspaces

# Delete workspace (requires confirmation)
provisioning workspace delete old-project

# Force delete without confirmation
provisioning workspace delete old-project --force

# Delete but keep backups
provisioning workspace delete old-project --backup

Deletion safety:

  • Requires explicit confirmation unless --force is used
  • Optionally creates backup before deletion
  • Validates no active operations are running
  • Updates workspace registry

Workspace Registry

The workspace registry is stored in user configuration and tracks all workspaces:

# ~/.config/provisioning/user_config.yaml
workspaces:
  active: my-project
  registry:
    my-project:
      path: /workspaces/workspace_my_project
      created: 2026-01-15T10:30:00Z
      last_used: 2026-01-15T14:20:00Z
      template: default
    dev-env:
      path: /workspaces/workspace_dev_env
      created: 2026-01-10T08:00:00Z
      last_used: 2026-01-14T15:45:00Z
      template: development

This centralized registry enables:

  • Fast workspace discovery
  • Usage tracking and statistics
  • Workspace templates
  • Path resolution

Workspace Enforcement

The CLI enforces workspace requirements for all infrastructure operations:

Workspace-exempt commands (work without active workspace):

  • provisioning help
  • provisioning version
  • provisioning workspace *
  • provisioning guide *
  • provisioning setup *
  • provisioning providers (list only)

Workspace-required commands (require active workspace):

  • provisioning server create
  • provisioning taskserv install
  • provisioning cluster deploy
  • provisioning batch submit
  • All infrastructure modification operations

If no workspace is active, workspace-required commands fail with:

Error: No active workspace
Please activate or create a workspace:
  provisioning workspace init <name>
  provisioning workspace switch <name>

This enforcement prevents accidental infrastructure modifications outside workspace boundaries.

Workspace Templates

Templates provide pre-configured workspace structures for common use cases:

Available Templates

TemplateDescriptionUse Case
defaultMinimal workspace structureGeneral purpose infrastructure
kubernetes-haHA Kubernetes setup with 3 control planesProduction Kubernetes deployments
developmentDev-optimized with Docker ComposeLocal testing and development
multi-cloudMultiple provider configurationsMulti-cloud deployments
database-clusterDatabase-focused with backup configsDatabase infrastructure
cicdCI/CD pipeline configurationsAutomated deployment pipelines

Using Templates

# Create from template
provisioning workspace init my-k8s --template kubernetes-ha

# List available templates
provisioning workspace templates

# Show template details
provisioning workspace template show kubernetes-ha

Templates pre-populate:

  • Infrastructure Nickel schemas
  • Provider configurations
  • Environment-specific defaults
  • Example workflow definitions
  • README with usage instructions

Multi-Environment Workflows

Workspaces excel at managing multiple environments:

Strategy 1: Separate Workspaces Per Environment

# Create dedicated workspaces
provisioning workspace init myapp-dev
provisioning workspace init myapp-staging
provisioning workspace init myapp-prod

# Switch between environments
provisioning ws switch myapp-dev
provisioning server create      # Creates in dev

provisioning ws switch myapp-prod
provisioning server create      # Creates in prod (isolated)

Pros: Complete isolation, different credentials, independent state Cons: More workspace management, duplicate configuration

Strategy 2: Single Workspace, Multiple Environments

# Single workspace with environment configs
provisioning workspace init myapp

# Deploy to different environments using flags
PROVISIONING_ENV=dev provisioning server create
PROVISIONING_ENV=staging provisioning server create
PROVISIONING_ENV=prod provisioning server create

Pros: Shared configuration, easier to maintain Cons: Shared credentials, risk of cross-environment mistakes

Strategy 3: Hybrid Approach

# Dev workspace for experimentation
provisioning workspace init myapp-dev

# Prod workspace for production only
provisioning workspace init myapp-prod

# Use environment flags within workspaces
provisioning ws switch myapp-prod
PROVISIONING_ENV=prod provisioning cluster deploy

Pros: Balances isolation and convenience Cons: More complex to explain to teams

Best Practices

Naming Conventions

# Good names (descriptive, unique)
workspace_librecloud_production
workspace_myapp_dev
workspace_k8s_staging

# Avoid (ambiguous, generic)
workspace_test
workspace_1
workspace_temp

Configuration Management

# Version control: Commit these files
infra/**/*.ncl                    # Infrastructure definitions
config/*-defaults.toml             # Environment defaults
config/provisioning.yaml           # Workspace metadata
extensions/**/*                    # Custom extensions

# Gitignore: Never commit these
config/local-overrides.toml        # User-specific overrides
runtime/**/*                       # Runtime data and state
**/*.secret                        # Credential files

Environment Separation

# Use dedicated workspaces for production
provisioning workspace init myapp-prod --template production

# Enable extra validation for production
provisioning ws switch myapp-prod
provisioning config set validation.strict true
provisioning config set confirmation.required true

Team Collaboration

# Share workspace structure via git
git clone repo/myapp-infrastructure
cd myapp-infrastructure
provisioning workspace init . --import

# Each team member creates local-overrides.toml
cat > config/local-overrides.toml <<EOF
[user]
default_region = "us-east-1"
confirmation_required = true
EOF

Troubleshooting

No Active Workspace Error

Error: No active workspace

Solution:

# List workspaces
provisioning workspace list

# Switch to workspace
provisioning workspace switch <name>

# Or create new workspace
provisioning workspace init <name>

Workspace Not Found

Error: Workspace 'my-project' not found in registry

Solution:

# Re-register workspace
provisioning workspace register /path/to/workspace_my_project

# Or recreate workspace
provisioning workspace init my-project

Workspace Path Doesn’t Exist

Error: Workspace path '/workspaces/workspace_my_project' does not exist

Solution:

# Remove invalid entry
provisioning workspace unregister my-project

# Re-create workspace
provisioning workspace init my-project

Integration with Other Features

Batch Workflows

Workspaces provide the context for batch workflow execution:

provisioning ws switch production
provisioning batch submit infra/batch-workflows.ncl

Batch workflows access workspace-specific:

  • Infrastructure definitions
  • Provider credentials
  • Configuration settings
  • State management

Test Environments

Test environments inherit workspace configuration:

provisioning ws switch dev
provisioning test quick kubernetes
# Uses dev workspace's configuration and providers

Version Management

Workspace configurations can specify tool versions:

# workspace/infra/versions.ncl
{
  tools = {
    nushell = "0.109.1"
    nickel = "1.15.1"
    kubernetes = "1.29.0"
  }
}

Provisioning validates versions match workspace requirements.

See Also

CLI Architecture

The Provisioning CLI provides a unified command-line interface for all infrastructure operations. It features 111+ commands organized into 7 domain-focused modules with 80+ shortcuts for improved productivity. The modular architecture achieved 84% code reduction while improving maintainability and extensibility.

Overview

The CLI architecture uses domain-driven design, separating concerns across modules. This refactoring reduced the main entry point from monolithic code to 211 lines. The architecture improves discoverability and enables rapid feature development.

CLI Architecture Modular Design Decentralized Command Registration

Key Metrics

MetricBeforeAfterImprovement
Main CLI lines1,32921184% reduction
Command domains1 (monolithic)7 (modular)7x organization
Commands~50111+122% increase
Shortcuts080+New capability
Help categories07Improved discovery

Domain Architecture

The CLI is organized into 7 domain-focused modules:

1. Infrastructure Domain

Commands: Server, TaskServ, Cluster, Infra management

# Server operations
provisioning server create
provisioning server list
provisioning server delete
provisioning server ssh <hostname>

# Task service operations
provisioning taskserv install kubernetes
provisioning taskserv list
provisioning taskserv remove kubernetes

# Cluster operations
provisioning cluster deploy my-cluster
provisioning cluster status my-cluster
provisioning cluster scale my-cluster --nodes 5

Shortcuts: s (server), t/task (taskserv), cl (cluster), i (infra)

2. Orchestration Domain

Commands: Workflow, Batch, Orchestrator management

# Workflow operations
provisioning workflow list
provisioning workflow status <id>
provisioning workflow cancel <id>

# Batch operations
provisioning batch submit infra/batch-workflows.ncl
provisioning batch monitor <workflow-id>
provisioning batch list

# Orchestrator management
provisioning orchestrator start
provisioning orchestrator status
provisioning orchestrator logs

Shortcuts: wf/flow (workflow), bat (batch), orch (orchestrator)

3. Development Domain

Commands: Module, Layer, Version, Pack management

# Module operations
provisioning module create my-module
provisioning module list
provisioning module test my-module

# Layer operations
provisioning layer add <name>
provisioning layer list

# Versioning
provisioning version bump minor
provisioning version list

# Packaging
provisioning pack create my-extension
provisioning pack publish my-extension

Shortcuts: mod (module), l (layer), v (version), p (pack)

4. Workspace Domain

Commands: Workspace management, templates

# Workspace operations
provisioning workspace init my-project
provisioning workspace list
provisioning workspace switch my-project
provisioning workspace delete old-project

# Template operations
provisioning workspace template list
provisioning workspace template show kubernetes-ha

Shortcuts: ws (workspace)

5. Configuration Domain

Commands: Config, Environment, Validate, Setup

# Configuration operations
provisioning config get servers.default_plan
provisioning config set servers.default_plan large
provisioning config validate

# Environment operations
provisioning env
provisioning allenv

# Setup operations
provisioning setup profile --profile developer
provisioning setup versions

# Validation
provisioning validate config
provisioning validate infra
provisioning validate nickel workspace/infra/my-cluster.ncl

Shortcuts: cfg (config), val (validate), st (setup)

6. Utilities Domain

Commands: SSH, SOPS, Cache, Plugin management

# SSH operations
provisioning ssh server-01
provisioning ssh server-01 -- uptime

# SOPS operations
provisioning sops encrypt config.yaml
provisioning sops decrypt config.enc.yaml

# Cache operations
provisioning cache clear
provisioning cache stats

# Plugin operations
provisioning plugin list
provisioning plugin install nu_plugin_auth
provisioning plugin update

Shortcuts: sops, cache, plug (plugin)

7. Generation Domain

Commands: Generate code, configs, docs

# Code generation
provisioning generate provider upcloud-new
provisioning generate taskserv postgresql
provisioning generate cluster k8s-ha

# Config generation
provisioning generate config --profile production
provisioning generate nickel --template kubernetes

# Documentation generation
provisioning generate docs

Shortcuts: g/gen (generate)

Command Shortcuts

The CLI provides 80+ shortcuts for improved productivity:

Infrastructure Shortcuts

Full CommandShortcutsExample
serversprovisioning s list
taskservt, taskprovisioning t install kubernetes
clusterclprovisioning cl deploy my-cluster
infrastructurei, infraprovisioning i list

Orchestration Shortcuts

Full CommandShortcutsExample
workflowwf, flowprovisioning wf list
batchbatprovisioning bat submit workflow.ncl
orchestratororchprovisioning orch status

Development Shortcuts

Full CommandShortcutsExample
modulemodprovisioning mod list
layerlprovisioning l add base
versionvprovisioning v bump minor
packpprovisioning p create extension

Configuration Shortcuts

Full CommandShortcutsExample
workspacewsprovisioning ws switch prod
configcfgprovisioning cfg get servers.plan
validatevalprovisioning val config
setupstprovisioning st profile --profile dev
environmentenvprovisioning env

Utility Shortcuts

Full CommandShortcutsExample
generateg, genprovisioning g provider aws-new
pluginplugprovisioning plug list

Quick Reference Shortcuts

Full CommandShortcutsPurpose
shortcutsscShow shortcuts reference
guide-Interactive guides
howto-Quick how-to guides

Bi-Directional Help System

The CLI features a bi-directional help system that works in both directions:

# Both of these work identically
provisioning help workspace
provisioning workspace help

# Shortcuts also work
provisioning help ws
provisioning ws help

# Category help
provisioning help infrastructure
provisioning help orchestration

This flexibility improves discoverability and aligns with natural user expectations.

Centralized Flag Handling

All global flags are handled consistently across all commands:

Global Flags

FlagShortPurposeExample
--debug-dEnable debug modeprovisioning --debug server create
--check-cDry-run mode (no changes)provisioning --check server delete
--yes-yAuto-confirm operationsprovisioning --yes cluster delete
--infra-iSpecify infrastructureprovisioning --infra my-cluster server list
--verbose-vVerbose outputprovisioning --verbose workflow list
--quiet-qMinimal outputprovisioning --quiet batch submit
--format-fOutput format (json/yaml/table)provisioning --format json server list

Command-Specific Flags

# Server creation flags
provisioning server create --plan large --region us-east-1 --zone a

# TaskServ installation flags
provisioning taskserv install kubernetes --version 1.29.0 --ha

# Cluster deployment flags
provisioning cluster deploy --replicas 3 --storage 100GB

# Batch workflow flags
provisioning batch submit workflow.ncl --parallel 5 --timeout 3600

Command Discovery

Categorized Help

The help system organizes commands by domain:

provisioning help

# Output shows categorized commands:
Infrastructure Commands:
  server        Manage servers (shortcuts: s)
  taskserv      Manage task services (shortcuts: t, task)
  cluster       Manage clusters (shortcuts: cl)

Orchestration Commands:
  workflow      Manage workflows (shortcuts: wf, flow)
  batch         Batch operations (shortcuts: bat)
  orchestrator  Orchestrator management (shortcuts: orch)

Configuration Commands:
  workspace     Workspace management (shortcuts: ws)
  config        Configuration management (shortcuts: cfg)
  validate      Validation operations (shortcuts: val)
  setup         System setup (shortcuts: st)

Quick Reference

# Fastest command reference
provisioning sc

# Shows comprehensive shortcuts table with examples

Interactive Guides

# Step-by-step guides
provisioning guide from-scratch      # Complete deployment guide
provisioning guide quickstart         # Command shortcuts reference
provisioning guide customize          # Customization patterns

Command Routing

The CLI uses a sophisticated dispatcher for command routing:

# provisioning/core/nulib/main_provisioning/dispatcher.nu

# Route command to appropriate handler
export def dispatch [
    command: string
    args: list<string>
] {
    match $command {
        # Infrastructure domain
        "server" | "s" => { route-to-handler "infrastructure" "server" $args }
        "taskserv" | "t" | "task" => { route-to-handler "infrastructure" "taskserv" $args }
        "cluster" | "cl" => { route-to-handler "infrastructure" "cluster" $args }

        # Orchestration domain
        "workflow" | "wf" | "flow" => { route-to-handler "orchestration" "workflow" $args }
        "batch" | "bat" => { route-to-handler "orchestration" "batch" $args }

        # Configuration domain
        "workspace" | "ws" => { route-to-handler "configuration" "workspace" $args }
        "config" | "cfg" => { route-to-handler "configuration" "config" $args }
    }
}

This routing enables:

  • Consistent error handling
  • Centralized logging
  • Workspace enforcement
  • Permission checks
  • Audit trail

Command Implementation Pattern

All commands follow a consistent implementation pattern:

# Example: provisioning/core/nulib/main_provisioning/commands/server.nu

# Main command handler
export def main [
    operation: string    # create, list, delete, etc.
    --check             # Dry-run mode
    --yes               # Auto-confirm
] {
    # 1. Validate workspace requirement
    enforce-workspace-requirement "server" $operation

    # 2. Load configuration
    let config = load-config

    # 3. Parse operation
    match $operation {
        "create" => { create-server $args $config --check=$check --yes=$yes }
        "list" => { list-servers $config }
        "delete" => { delete-server $args $config --yes=$yes }
        "ssh" => { ssh-to-server $args $config }
        _ => { error $"Unknown server operation: ($operation)" }
    }

    # 4. Log operation (audit trail)
    log-operation "server" $operation $args
}

This pattern ensures:

  • Consistent behavior
  • Proper error handling
  • Configuration integration
  • Workspace enforcement
  • Audit logging

Modular Structure

The CLI codebase is organized for maintainability:

provisioning/core/
├── cli/
│   └── provisioning           # Main CLI entry point (211 lines)
│
├── nulib/
│   ├── main_provisioning/
│   │   ├── dispatcher.nu      # Command routing (central dispatch)
│   │   ├── flags.nu           # Centralized flag handling
│   │   ├── help_system_fluent.nu  # Categorized help with i18n
│   │   │
│   │   └── commands/          # Domain-specific command handlers
│   │       ├── infrastructure/
│   │       │   ├── server.nu
│   │       │   ├── taskserv.nu
│   │       │   └── cluster.nu
│   │       │
│   │       ├── orchestration/
│   │       │   ├── workflow.nu
│   │       │   ├── batch.nu
│   │       │   └── orchestrator.nu
│   │       │
│   │       ├── configuration/
│   │       │   ├── workspace.nu
│   │       │   ├── config.nu
│   │       │   └── validate.nu
│   │       │
│   │       └── utilities/
│   │           ├── ssh.nu
│   │           ├── sops.nu
│   │           └── cache.nu
│   │
│   └── lib_provisioning/      # Core libraries (used by commands)
│       ├── config/
│       ├── providers/
│       ├── workspace/
│       └── utils/

This structure enables:

  • Clear separation of concerns
  • Easy addition of new commands
  • Testable command handlers
  • Reusable core libraries

Internationalization

The CLI supports multiple languages via Fluent catalog:

# Automatic locale detection
export LANG=es_ES.UTF-8
provisioning help    # Shows Spanish help if es-ES catalog exists

# Supported locales
en-US (default)      # English
es-ES                # Spanish
fr-FR                # French
de-DE                # German

Catalog structure:

provisioning/locales/
├── en-US/
│   └── help.ftl      # English help strings
├── es-ES/
│   └── help.ftl      # Spanish help strings
└── de-DE/
    └── help.ftl      # German help strings

Extension Points

The modular architecture provides clean extension points:

Adding New Commands

# 1. Create command handler
provisioning/core/nulib/main_provisioning/commands/my_new_command.nu

# 2. Register in dispatcher
# provisioning/core/nulib/main_provisioning/dispatcher.nu
"my-command" | "mc" => { route-to-handler "utilities" "my-command" $args }

# 3. Add help entry
# provisioning/locales/en-US/help.ftl
my-command-help = Manage my new feature

# 4. Command is now available
provisioning my-command <operation>
provisioning mc <operation>  # Shortcut also works

Adding New Domains

# 1. Create domain directory
provisioning/core/nulib/main_provisioning/commands/my_domain/

# 2. Add domain commands
my_domain/
├── command1.nu
├── command2.nu
└── command3.nu

# 3. Register domain in dispatcher

# 4. Add domain help category

# Domain is now available with all commands

Command Aliases

The CLI supports command aliases for common operations:

# Defined in user configuration
# ~/.config/provisioning/user_config.yaml
aliases:
  deploy: "cluster deploy"
  list-all: "server list && taskserv list && cluster list"
  quick-test: "test quick kubernetes"

# Usage
provisioning deploy my-cluster     # Expands to: cluster deploy my-cluster
provisioning list-all              # Runs multiple commands
provisioning quick-test            # Runs test with preset

Best Practices

Using Shortcuts Effectively

# Development workflow (frequent commands)
provisioning ws switch dev          # Switch to dev workspace
provisioning s list                 # Quick server list
provisioning t install postgres     # Install task service
provisioning cl status my-cluster   # Check cluster status

# Production workflow (explicit commands for clarity)
provisioning workspace switch production
provisioning server create --plan large --check
provisioning cluster deploy critical-cluster --yes

Dry-Run Before Execution

# Always check before dangerous operations
provisioning --check server delete old-servers
provisioning --check cluster delete test-cluster

# If output looks good, run for real
provisioning --yes server delete old-servers

Using Output Formats

# JSON output for scripting
provisioning --format json server list | jq '.[] | select(.status == "running")'

# YAML output for readability
provisioning --format yaml cluster status my-cluster

# Table output for humans (default)
provisioning server list

Performance Optimizations

The modular architecture enables several performance optimizations:

Lazy Loading

Commands are loaded on-demand, reducing startup time:

# Only loads server command module when needed
provisioning server list    # Fast startup (loads server.nu only)

Command Caching

Frequently-used commands benefit from caching:

# First run: ~200ms (loads modules, config)
provisioning server list

# Subsequent runs: ~50ms (cached config, loaded modules)
provisioning server list

Parallel Execution

Batch operations execute in parallel:

# Executes server creation in parallel (up to configured limit)
provisioning batch submit multi-server-workflow.ncl --parallel 10

Troubleshooting

Command Not Found

Error: Unknown command 'servr'
Did you mean: server (s)

The CLI provides helpful suggestions for typos.

Missing Workspace

Error: No active workspace
Please activate or create a workspace:
  provisioning workspace init <name>
  provisioning workspace switch <name>

Workspace enforcement prevents accidental operations.

Permission Denied

Error: Operation requires admin permissions
Please run with elevated privileges or contact administrator

Permission system prevents unauthorized operations.

See Also

Configuration System

Batch Workflows

Orchestrator

Interactive Guides

Test Environment

Platform Installer

Security System

Version Management

Nushell Plugins

Provisioning includes 17 high-performance native Rust plugins for Nushell, providing 10-50x speed improvements over HTTP APIs. Plugins handle critical functionality: templates, configuration, encryption, orchestration, and secrets management.

Overview

Performance Benefits

Plugins provide significant performance improvements for frequently-used operations:

PluginSpeed ImprovementUse Case
nu_plugin_tera10-15x fasterTemplate rendering
nu_plugin_nickel5-8x fasterConfiguration processing
nu_plugin_orchestrator30-50x fasterQuery orchestrator state
nu_plugin_kms10x fasterEncryption/decryption
nu_plugin_auth5x fasterAuthentication operations

Installation

All plugins install automatically with Provisioning:

# Automatic installation during setup
provisioning install

# Or manual installation
cd /path/to/provisioning
./scripts/install-plugins.nu

# Verify installation
provisioning plugins list

Plugin Management

# List installed plugins with versions
provisioning plugins list

# Check plugin status
provisioning plugins status

# Update all plugins
provisioning plugins update --all

# Update specific plugin
provisioning plugins update nu_plugin_tera

# Remove plugin
provisioning plugins remove nu_plugin_tera

Core Plugins (Priority)

1. nu_plugin_tera

Template Rendering Engine

Nushell plugin for Tera template processing (Jinja2-style syntax).

# Install
provisioning plugins install nu_plugin_tera

# Usage in Nushell
let template = "Hello {{ name }}!"
let context = { name: "World" }
$template | tera render $context
# Output: "Hello World!"

Features:

  • Jinja2-compatible syntax
  • Built-in filters and functions
  • Template inheritance
  • Macro support
  • Custom filters via Rust

Performance: 10-15x faster than HTTP template service

Use Cases:

  • Generating infrastructure configurations
  • Creating dynamic scripts
  • Building deployment templates
  • Rendering documentation

Example: Generate infrastructure config:

let infra_template = "
{
  servers = [
    {% for server in servers %}
    {
      name = \"{{ server.name }}\"
      cpu = {{ server.cpu }}
      memory = {{ server.memory }}
    }
    {% if not loop.last %},{% endif %}
    {% endfor %}
  ]
}
"

let servers = [
  { name: "web-01", cpu: 4, memory: 8 }
  { name: "web-02", cpu: 4, memory: 8 }
]

$infra_template | tera render { servers: $servers }

2. nu_plugin_nickel

Nickel Configuration Plugin

Native Nickel compilation and validation in Nushell.

# Install
provisioning plugins install nu_plugin_nickel

# Usage in Nushell
let nickel_code = '{ name = "server", cpu = 4 }'
$nickel_code | nickel eval
# Output: { name: "server", cpu: 4 }

Features:

  • Parse and evaluate Nickel expressions
  • Type checking and validation
  • Schema enforcement
  • Merge configurations
  • Generate JSON/YAML output

Performance: 5-8x faster than CLI invocation

Use Cases:

  • Validate infrastructure definitions
  • Process Nickel schemas
  • Merge configuration files
  • Generate typed configurations

Example: Validate and merge configs:

let base_config = open base.ncl | nickel eval
let env_config = open prod-defaults.ncl | nickel eval

let merged = $base_config | nickel merge $env_config
$merged | nickel validate --schema infrastructure-schema.ncl

3. nu_plugin_fluent

Internationalization (i18n) Plugin

Fluent translation system for multi-language support.

# Install
provisioning plugins install nu_plugin_fluent

# Usage in Nushell
fluent load "./locales"
fluent set-locale "es-ES"
fluent get "help-infra-server-create"
# Output: "Crear un nuevo servidor"

Features:

  • Load Fluent catalogs (.ftl files)
  • Dynamic locale switching
  • Pluralization support
  • Fallback chains
  • Translation coverage reports

Performance: Native Rust implementation, <1ms per translation

Use Cases:

  • CLI help text in multiple languages
  • Form labels and prompts
  • Error messages
  • Interactive guides

Supported Locales:

  • en-US (English)
  • es-ES (Spanish)
  • pt-BR (Portuguese - planned)
  • fr-FR (French - planned)
  • ja-JP (Japanese - planned)

Example: Multi-language help system:

fluent load "provisioning/locales"

# Spanish help
fluent set-locale "es-ES"
fluent get "help-main-title"    # "SISTEMA DE PROVISIÓN"

# English help (fallback)
fluent set-locale "fr-FR"
fluent get "help-main-title"    # Falls back to "PROVISIONING SYSTEM"

4. nu_plugin_secretumvault

Post-Quantum Cryptography Vault

SecretumVault integration for quantum-resistant secret storage.

# Install
provisioning plugins install nu_plugin_secretumvault

# Usage in Nushell
secretumvault-plugin store "api-key" "secret-value"
let key = secretumvault-plugin retrieve "api-key"
secretumvault-plugin delete "api-key"

Features:

  • CRYSTALS-Kyber encryption (post-quantum)
  • Hybrid encryption (PQC + AES-256)
  • Secure credential injection
  • Key rotation
  • Audit logging

Performance: <100ms for encrypt/decrypt operations

Use Cases:

  • Store infrastructure credentials
  • Manage API keys
  • Handle database passwords
  • Secure configuration values

Example: Secure credential management:

# Store credentials in vault
secretumvault-plugin store "aws-access-key" "AKIAIOSFODNN7EXAMPLE"
secretumvault-plugin store "aws-secret-key" "wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY"

# Retrieve for use
let aws_key = secretumvault-plugin retrieve "aws-access-key"
provisioning aws configure --access-key $aws_key

Performance Plugins

5. nu_plugin_orchestrator

Orchestrator State Query Plugin

High-speed queries to orchestrator state and workflow data.

# Install
provisioning plugins install nu_plugin_orchestrator

# Usage in Nushell
orchestrator query workflows --filter status=running
orchestrator query tasks --limit 100
orchestrator query checkpoints --workflow deploy-k8s

Performance: 30-50x faster than HTTP API

Queries:

  • Workflows (list, status, logs)
  • Tasks (state, duration, dependencies)
  • Checkpoints (recovery points)
  • History (audit trail)

Example: Monitor running workflows:

let running = orchestrator query workflows --filter status=running
$running | each { | w |
  print $"Workflow: ($w.name) - ($w.progress)%"
}

6. nu_plugin_kms

Key Management System (Encryption) Plugin

Fast encryption/decryption with KMS backends.

# Install
provisioning plugins install nu_plugin_kms

# Usage in Nushell
let encrypted = "secret-data" | kms encrypt --algorithm aes-256-gcm
$encrypted | kms decrypt

Performance: 10x faster than external KMS calls, 5ms encryption

Supported Algorithms:

  • AES-256-GCM
  • ChaCha20-Poly1305
  • Kyber (post-quantum)
  • Falcon (signatures)

Features:

  • Symmetric encryption
  • Key derivation (Argon2id, PBKDF2)
  • Authenticated encryption
  • HSM integration (optional)

Example: Encrypt infrastructure secrets:

let config = open infrastructure.ncl
let encrypted = $config | kms encrypt --key master-key

# Decrypt when needed
let decrypted = $encrypted | kms decrypt --key master-key
$decrypted | nickel eval

7. nu_plugin_auth

Authentication Plugin

Multi-method authentication with keyring integration.

# Install
provisioning plugins install nu_plugin_auth

# Usage in Nushell
let token = auth login --method jwt --provider openid
auth set-token $token
auth verify-token

Performance: 5x faster local authentication

Features:

  • JWT token generation and validation
  • OAuth2 support
  • SAML support
  • OS keyring integration
  • MFA support

Methods:

  • JWT (JSON Web Tokens)
  • OAuth2 (GitHub, Google, Microsoft)
  • SAML
  • LDAP
  • Local keyring

Example: Authenticate and store credentials:

# Login and get token
let token = auth login --method oauth2 --provider github
auth set-token $token --store-keyring

# Verify authentication
auth verify-token      # Check if token valid
auth whoami            # Show current user

Utility Plugins

8. nu_plugin_hashes

Cryptographic Hashing Plugin

Multiple hash algorithms for data integrity.

# Install
provisioning plugins install nu_plugin_hashes

# Usage in Nushell
"data" | hashes sha256
"data" | hashes blake3

Algorithms:

  • SHA256, SHA512
  • BLAKE3
  • MD5 (legacy)
  • SHA1 (legacy)

9. nu_plugin_highlight

Syntax Highlighting Plugin

Code syntax highlighting for display and logging.

# Install
provisioning plugins install nu_plugin_highlight

# Usage in Nushell
open script.sh | highlight --language bash
open config.ncl | highlight --language nickel

Languages:

  • Bash/Shell
  • Nickel
  • YAML
  • JSON
  • Rust
  • SQL
  • Others

10. nu_plugin_image

Image Processing Plugin

Image manipulation and format conversion.

# Install
provisioning plugins install nu_plugin_image

# Usage in Nushell
open diagram.png | image resize --width 800 --height 600
open logo.jpg | image convert --format webp

Operations:

  • Resize, crop, rotate
  • Format conversion
  • Compression
  • Metadata extraction

11. nu_plugin_clipboard

Clipboard Management Plugin

Read/write system clipboard.

# Install
provisioning plugins install nu_plugin_clipboard

# Usage in Nushell
"api-key" | clipboard copy
clipboard paste

Features:

  • Copy to clipboard
  • Paste from clipboard
  • Manage clipboard history
  • Cross-platform support

12. nu_plugin_desktop_notifications

Desktop Notifications Plugin

System notifications for long-running operations.

# Install
provisioning plugins install nu_plugin_desktop_notifications

# Usage in Nushell
notifications notify "Deployment completed" --type success
notifications notify "Errors detected" --type error

Features:

  • Success, warning, error notifications
  • Custom titles and messages
  • Sound alerts

13. nu_plugin_qr_maker

QR Code Generator Plugin

Generate QR codes for configuration sharing.

# Install
provisioning plugins install nu_plugin_qr_maker

# Usage in Nushell
" [https://example.com/config"](https://example.com/config") | qr-maker generate --output config.png
"workspace-setup-command" | qr-maker generate --ascii

14. nu_plugin_port_extension

Port/Network Utilities Plugin

Network port management and diagnostics.

# Install
provisioning plugins install nu_plugin_port_extension

# Usage in Nushell
port-extension list-open --port 8080
port-extension check-available --port 9000

Legacy/Secondary Plugins

15. nu_plugin_kcl

KCL Configuration Plugin (DEPRECATED)

Legacy KCL support (Nickel is preferred).

⚠️ Status: Deprecated - Use nu_plugin_nickel instead

# Install
provisioning plugins install nu_plugin_kcl

# Usage (not recommended)
let config = open config.kcl | kcl eval

16. api_nu_plugin_kcl

KCL API Plugin (DEPRECATED)

HTTP API wrapper for KCL.

⚠️ Status: Deprecated - Use nu_plugin_nickel instead


17. _nu_plugin_inquire (Historical)

Interactive Prompts Plugin (HISTORICAL)

Old inquiry/prompt system, replaced by TypeDialog.

⚠️ Status: Historical/archived


Plugin Installation & Management

Installation Methods

Automatic with Provisioning:

provisioning install
# Installs all recommended plugins automatically

Selective Installation:

# Install specific plugins
provisioning plugins install nu_plugin_tera nu_plugin_nickel nu_plugin_secretumvault

# Install plugin category
provisioning plugins install --category core          # Essential plugins
provisioning plugins install --category performance   # Performance plugins
provisioning plugins install --category utilities     # Utility plugins

Manual Installation:

# Build and install from source
cd /Users/Akasha/project-provisioning/plugins/nushell-plugins/nu_plugin_tera
cargo install --path .

# Then load in Nushell
plugin add nu_plugin_tera

Configuration

Plugin Loading in Nushell:

# In env.nu or config.nu
plugin add nu_plugin_tera
plugin add nu_plugin_nickel
plugin add nu_plugin_secretumvault
plugin add nu_plugin_fluent
plugin add nu_plugin_auth
plugin add nu_plugin_kms
plugin add nu_plugin_orchestrator

# And more...

Plugin Status:

# Check all plugins
provisioning plugins list

# Check specific plugin
provisioning plugins status nu_plugin_tera

# Detailed information
provisioning plugins info nu_plugin_tera --verbose

Best Practices

Use Plugins When

  • ✅ Processing large amounts of data (templates, config)
  • ✅ Sensitive operations (encryption, secrets)
  • ✅ Frequent operations (queries, auth)
  • ✅ Performance critical paths

Fallback to HTTP API When

  • ❌ Plugin not installed (automatic fallback)
  • ❌ Older Nushell version incompatible
  • ❌ Special features only in API
# Plugins have automatic fallback
# If nu_plugin_tera not available, uses HTTP API automatically
let template = "{{ name }}" | tera render { name: "test" }
# Works either way

Troubleshooting

Plugin Not Loading

# Reload Nushell
nu

# Check plugin errors
plugin list --debug

# Reinstall plugin
provisioning plugins remove nu_plugin_tera
provisioning plugins install nu_plugin_tera

Performance Issues

# Check plugin status
provisioning plugins status

# Monitor plugin usage
provisioning monitor plugins

# Profile plugin calls
provisioning profile nu_plugin_tera

Multilingual Support

Provisioning includes comprehensive multilingual support for help text, forms, and interactive interfaces. The system uses Mozilla Fluent for translations with automatic fallback chains.

Supported Languages

Currently supported with 100% translation coverage:

LanguageLocaleStatusStrings
English (US)en-US✅ Complete245
Spanish (Spain)es-ES✅ Complete245
Portuguese (Brazil)pt-BR🔄 Planned-
French (France)fr-FR🔄 Planned-
Japanese (Japan)ja-JP🔄 Planned-

Coverage Requirement: 95% of strings translated to critical locales (en-US, es-ES).

Using Different Languages

Setting Language via Environment Variable

Select language using the LANG environment variable:

# English (default)
provisioning help infrastructure

# Spanish
LANG=es_ES provisioning help infrastructure

# Fallback to English if locale not available
LANG=fr_FR provisioning help infrastructure
# Output: English (en-US) [fallback chain]

Locale Resolution

Language selection follows this order:

  1. Check LANG environment variable (e.g., es_ES)
  2. Match to configured locale (es-ES)
  3. If not found, follow fallback chain (es-ES → en-US)
  4. Default to en-US if no match

Format: LANG uses underscore (es_ES), locales use hyphen (es-ES). System handles conversion automatically.

Translation System Architecture

Mozilla Fluent Format

All translations use Mozilla Fluent (.ftl files), which provides:

  • Simple Syntax: Key-value pairs with rich formatting
  • Pluralization: Support for language-specific plural rules
  • Attributes: Multiple values per key for contextual translation
  • Automatic Fallback: Chain resolution when keys missing
  • Extensibility: Support for custom formatting functions

Example Fluent syntax:

help-infra-server-create = Create a new server
form-database_type-option-postgres = PostgreSQL (Recommended)
form-replicas-prompt = Number of replicas
form-replicas-help = How many replicas to run

File Organization

provisioning/locales/
├── i18n-config.toml              # Central i18n configuration
├── en-US/                         # English base language
│   ├── help.ftl                  # Help system strings (65 keys)
│   └── forms.ftl                 # Form strings (180 keys)
└── es-ES/                         # Spanish translations
    ├── help.ftl                  # Help system translations
    └── forms.ftl                 # Form translations

String Categories:

  • help.ftl (65 strings): Help text, menu items, category descriptions, error messages
  • forms.ftl (180 strings): Form labels, placeholders, help text, options

Help System Translations

Help system provides multi-language support for all command categories:

Categories Covered

CategoryCoverageExample Keys
Infrastructure✅ 21 stringsserver commands, taskserv, clusters, VMs
Orchestration✅ 18 stringsworkflows, batch operations, orchestrator
Workspace✅ Completeworkspace management, templates
Setup✅ Completesystem configuration, initialization
Authentication✅ CompleteJWT, MFA, sessions
Platform✅ Completeservices, Control Center, MCP
Development✅ Completemodules, versions, plugins
Utilities✅ Completeproviders, SOPS, SSH

Example: Help Output in Spanish

$ LANG=es_ES provisioning help infrastructure
SERVIDOR E INFRAESTRUCTURA
Gestión de servidores, taskserv, clusters, VM e infraestructura.

COMANDOS DE SERVIDOR
  server create         Crear un nuevo servidor
  server delete         Eliminar un servidor existente
  server list           Listar todos los servidores
  server status         Ver estado de un servidor

COMANDOS DE TASKSERV
  taskserv create       Crear un nuevo servicio de tarea
  taskserv delete       Eliminar un servicio de tarea
  taskserv configure    Configurar un servicio de tarea
  taskserv status       Ver estado del servicio de tarea

Form Translations (TypeDialog Integration)

Interactive forms automatically use the selected language:

Setup Form

Project information, database configuration, API settings, deployment options, security, etc.

# English form
$ provisioning setup profile
📦 Project name: [my-app]

# Spanish form
$ LANG=es_ES provisioning setup profile
📦 Nombre del proyecto: [mi-app]

Translated Form Fields

Each form field has four translated strings:

ComponentPurposeExample en-USExample es-ES
promptField label“Project name”“Nombre del proyecto”
helpHelper text“Project name (lowercase alphanumeric with hyphens)”“Nombre del proyecto (minúsculas alfanuméricas con guiones)”
placeholderExample value“my-app”“mi-app”
optionDropdown choice“PostgreSQL (Recommended)”“PostgreSQL (Recomendado)”

Supported Forms

  • Unified Setup: Project info, database, API, deployment, security, terms
  • Authentication: Login form (username, password, remember me, forgot password)
  • Setup Wizard: Quick/standard/advanced modes
  • MFA Enrollment: TOTP, SMS, backup codes, device management
  • Infrastructure: Delete confirmations, resource prompts, data retention

Fallback Chain Configuration

When a translation string is missing, the system automatically falls back to the parent locale:

# From i18n-config.toml
[fallback_chains]
es-ES = ["en-US"]
pt-BR = ["pt-PT", "es-ES", "en-US"]
fr-FR = ["en-US"]
ja-JP = ["en-US"]

Resolution Example:

  1. User requests Spanish (es-ES): provisioning help
  2. Look for string in es-ES/help.ftl
  3. If missing, fallback to en-US (help-infra-server-create = "Create a new server")
  4. If still missing, use literal key name as display text

Adding New Languages

1. Add Locale Configuration

Edit provisioning/locales/i18n-config.toml:

[locales.pt-BR]
name = "Portuguese (Brazil)"
direction = "ltr"
plurals = 2
decimal_separator = ","
thousands_separator = "."
date_format = "DD/MM/YYYY"

[fallback_chains]
pt-BR = ["pt-PT", "es-ES", "en-US"]

Configuration Fields:

  • name: Display name of locale
  • direction: Text direction (ltr/rtl)
  • plurals: Number of plural forms (1-6 depending on language)
  • decimal_separator: Locale-specific decimal format
  • thousands_separator: Number formatting
  • date_format: Locale-specific date format
  • currency_symbol: Currency symbol (optional)
  • currency_position: “prefix” or “suffix” (optional)

2. Create Locale Directory

mkdir -p provisioning/locales/pt-BR

3. Create Translation Files

Copy English files as base:

cp provisioning/locales/en-US/help.ftl provisioning/locales/pt-BR/help.ftl
cp provisioning/locales/en-US/forms.ftl provisioning/locales/pt-BR/forms.ftl

4. Translate Strings

Edit pt-BR/help.ftl and pt-BR/forms.ftl with translated content. Follow naming conventions:

# Help strings: help-{category}-{element}
help-infra-server-create = Criar um novo servidor

# Form prompts: form-{element}-prompt
form-project_name-prompt = Nome do projeto

# Form help: form-{element}-help
form-project_name-help = Nome do projeto (alfanumérico minúsculo com hífens)

# Form options: form-{element}-option-{value}
form-database_type-option-postgres = PostgreSQL (Recomendado)

5. Validate Translation

Check coverage and syntax:

# Validate Fluent file syntax
provisioning i18n validate --locale pt-BR

# Check translation coverage
provisioning i18n coverage --locale pt-BR

# List missing translations
provisioning i18n missing --locale pt-BR

6. Update Documentation

Document new language support in translations_status.md.

Validation & Quality Standards

Translation Quality Rules

Naming Conventions (REQUIRED):

  • Help strings: help-{category}-{element} (e.g., help-infra-server-create)
  • Form prompts: form-{element}-prompt (e.g., form-project_name-prompt)
  • Form help: form-{element}-help (e.g., form-project_name-help)
  • Form placeholders: form-{element}-placeholder
  • Form options: form-{element}-option-{value} (e.g., form-database_type-option-postgres)
  • Section headers: section-{name}-title

Coverage Requirements:

  • Critical Locales: en-US, es-ES require 95% minimum coverage
  • Warning Threshold: 80% triggers warnings during build
  • Incomplete Locales: 0% coverage allowed (inherit via fallback chain)

Testing Localization

Test translations via different methods:

# Test help system in Spanish
LANG=es_ES provisioning help infrastructure

# Test form display in Spanish
LANG=es_ES provisioning setup profile

# Validate all translation files
provisioning i18n validate --all

# Generate coverage report
provisioning i18n coverage --format=json > coverage.json

Implementation Details

TypeDialog Integration

TypeDialog forms reference Fluent keys via locales_path configuration:

# In form.toml
locales_path = "../../../locales"

[[elements]]
name = "project_name"
prompt = "form-project_name-prompt"    # References: locales/*/forms.ftl
help = "form-project_name-help"
placeholder = "form-project_name-placeholder"

Resolution Process:

  1. Read locales_path from form configuration
  2. Check LANG environment variable (converted to locale format: es_ES → es-ES)
  3. Load Fluent file (e.g., locales/es-ES/forms.ftl)
  4. Resolve string key → value
  5. If key missing, follow fallback chain
  6. If still missing, use literal key name

Help System Integration

Help system uses Fluent catalog loader in provisioning/core/nulib/main_provisioning/help_system.nu:

# Load help strings for current locale
let help_strings = (load_fluent_catalog $locale)

# Display localized help text
print ($help_strings | get help-infrastructure-title)

Maintenance

Adding New Translations

When new help text or forms are added:

  1. Add English strings to en-US/help.ftl or en-US/forms.ftl
  2. Add Spanish translations to es-ES/help.ftl or es-ES/forms.ftl
  3. Run validation: provisioning i18n validate
  4. Update translations_status.md with new counts
  5. If coverage drops below 95%, fix before release

Updating Existing Translations

To modify existing translated string:

  1. Edit key in en-US/*.ftl and all locale-specific files
  2. Run validation to ensure consistency
  3. Test in both languages: LANG=en_US provisioning help and LANG=es_ES provisioning help

Current Translation Status

Last Updated: 2026-01-13 | Status: 100% Complete

String Count

Componenten-USes-ESStatus
Help System6565✅ Complete
Forms180180✅ Complete
Total245245✅ Complete

Features Enabled

FeatureStatusPurpose
Pluralization✅ EnabledSupport language-specific plural rules
Number Formatting✅ EnabledLocale-specific number/currency formatting
Date Formatting✅ EnabledLocale-specific date display
Fallback Chains✅ EnabledAutomatic fallback to English
Gender Agreement⚠️ DisabledNot needed for Spanish help strings
RTL Support⚠️ DisabledNo RTL languages configured yet

Provisioning Logo

Provisioning

Operations

Production deployment, monitoring, maintenance, and operational best practices for running Provisioning infrastructure at scale.

Overview

This section covers everything needed to operate Provisioning in production:

  • Deployment strategies - Single-cloud, multi-cloud, hybrid with zero-downtime updates
  • Service management - Microservice lifecycle, scaling, health checks, failover
  • Observability - Metrics (Prometheus), logs (ELK), traces (Jaeger), dashboards
  • Incident response - Detection, triage, remediation, postmortem automation
  • Backup & recovery - Strategies, testing, disaster recovery, point-in-time restore
  • Performance optimization - Profiling, caching, scaling, resource optimization
  • Troubleshooting - Debugging, log analysis, diagnostic tools, support

Operational Guides

Deployment and Management

  • Deployment Modes - Single-cloud, multi-cloud, hybrid, canary, blue-green, rolling updates with zero downtime.

  • Service Management - Microservice lifecycle, scaling policies, health checks, graceful shutdown, rolling restarts.

  • Platform Installer - TUI and unattended installation, provider setup, workspace creation, post-install configuration.

Monitoring and Observability

Monitoring Stack Prometheus Grafana Fluentd ElasticSearch Alertmanager

  • Monitoring Setup - Prometheus metrics, Grafana dashboards, alerting rules, SLO monitoring, 12 microservices

  • Logging and Analysis - Centralized logging with ELK Stack, log aggregation, filtering, searching, performance analysis.

  • Distributed Tracing - Jaeger integration, span collection, trace visualization, latency analysis across microservices.

Resilience and Recovery

  • Incident Response - Severity levels, triage, investigation, mitigation, escalation, postmortem

  • Backup Strategies - Full, incremental, PITR backups with RTO/RPO targets, testing procedures, recovery workflows.

  • Disaster Recovery - DR planning, failover procedures, failback strategies, RTO/RPO targets, testing schedules.

Disaster Recovery Topology Multi-Region Failover Primary Backup

Troubleshooting

  • Troubleshooting Guide - Common issues, debugging techniques, log analysis, diagnostic tools, support resources.

  • Platform Health - Health check procedures, system status, component status, SLO metrics, error budgets.

Operational Workflows

I’m deploying to production

Follow: Deployment ModesService ManagementMonitoring Setup

I need to monitor infrastructure

Setup: Monitoring Setup for metrics, Logging and Analysis for logs, Distributed Tracing for traces

I’m experiencing an incident

Execute: Incident Response with triage, investigation, mitigation, escalation

I need to backup and recover

Implement: Backup Strategies with testing, Disaster Recovery for major outages

I need to optimize performance

Follow: Performance Optimization for profiling and tuning

I need help troubleshooting

Consult: Troubleshooting Guide for common issues and solutions

Deployment Architecture

Development
  ↓
Staging (test all)
  ↓
Canary (1% traffic)
  ↓
Rolling (increase % gradually)
  ↓
Production (100%)

SLO Targets

ServiceAvailabilityP99 LatencyError Budget
API Gateway99.99%<100ms4m 26s/month
Orchestrator99.9%<500ms43m 46s/month
Control-Center99.95%<300ms21m 56s/month
Detector99.5%<2s3h 36s/month
All Others99.9%<1s43m 46s/month

Monitoring Stack

  • Metrics - Prometheus (15s scrape interval, 15d retention)
  • Logs - ELK Stack (Elasticsearch, Logstash, Kibana) with 30d retention
  • Traces - Jaeger (sampling 10%, 24h retention)
  • Dashboards - Grafana with pre-built dashboards per microservice
  • Alerting - AlertManager with escalation rules and notification channels

Operational Commands

# Check system health
provisioning status health

# View metrics
provisioning metrics view --service orchestrator

# Check SLO status
provisioning slo status

# Run diagnostics
provisioning diagnose system

# Backup infrastructure
provisioning backup create --name daily-$(date +%Y%m%d)

# Restore from backup
provisioning backup restore --backup-id backup-id
  • Architecture → See provisioning/docs/src/architecture/
  • Features → See provisioning/docs/src/features/
  • Development → See provisioning/docs/src/development/
  • Security → See provisioning/docs/src/security/
  • Examples → See provisioning/docs/src/examples/

Deployment Modes

The Provisioning platform supports three deployment modes designed for different operational contexts: interactive TUI for guided setup, headless CLI for automation, and unattended mode for CI/CD pipelines.

Overview

Deployment modes determine how the platform installer and orchestrator interact with the environment:

ModeUse CaseUser InteractionConfigurationRollback
Interactive TUIFirst-time setup, explorationFull interactive terminal UIGuided wizardManual intervention
Headless CLIScripted automationCommand-line flags onlyPre-configured filesAutomatic checkpoint
UnattendedCI/CD pipelinesZero interactionConfig file requiredAutomatic rollback

Interactive TUI Mode

Beautiful terminal user interface for guided platform installation and configuration.

When to Use

  • First-time platform installation
  • Exploring configuration options
  • Learning platform features
  • Development and testing environments
  • Manual infrastructure provisioning

Features

Seven interactive screens with real-time validation:

  1. Welcome Screen - Platform overview and prerequisites check
  2. Deployment Mode Selection - Solo, MultiUser, CICD, Enterprise
  3. Component Selection - Choose platform services to install
  4. Configuration Builder - Interactive settings editor
  5. Provider Setup - Cloud provider credentials and configuration
  6. Review and Confirm - Summary before installation
  7. Installation Progress - Real-time tracking with checkpoint recovery

Starting Interactive Mode

# Launch interactive installer
provisioning-installer

# Or via main CLI
provisioning install --mode tui
Tab/Shift+Tab - Navigate fields
Enter - Select/confirm
Esc - Cancel/go back
Arrow keys - Navigate lists
Space - Toggle checkboxes
Ctrl+C - Exit installer

Headless CLI Mode

Command-line interface for scripted automation without graphical interface.

When to Use

  • Automated deployment scripts
  • Remote server installation via SSH
  • Reproducible infrastructure provisioning
  • Configuration management systems
  • Batch deployments across multiple servers

Features

  • Non-interactive installation
  • Configuration via command-line flags
  • Pre-validation of all inputs
  • Structured JSON/YAML output
  • Exit codes for script integration
  • Checkpoint-based recovery

Command Syntax

provisioning-installer --headless \
  --mode <sol| o multiuse| r cic| d enterprise> \
  --components <comma-separated-list> \
  --storage-path <path> \
  --database <backend> \
  --log-level <level> \
  [--yes] \
  [--config <file>]

Example Deployments

Solo developer setup:

provisioning-installer --headless \
  --mode solo \
  --components orchestrator,control-center \
  --yes

CI/CD pipeline deployment:

provisioning-installer --headless \
  --mode cicd \
  --components orchestrator,vault-service \
  --database surrealdb \
  --yes

Enterprise production deployment:

provisioning-installer --headless \
  --mode enterprise \
  --config /etc/provisioning/enterprise.toml \
  --yes

Unattended Mode

Zero-interaction deployment for fully automated CI/CD pipelines.

When to Use

  • Continuous integration pipelines
  • Continuous deployment workflows
  • Infrastructure as Code provisioning
  • Automated testing environments
  • Container image builds
  • Cloud instance initialization

Requirements

  1. Configuration file must exist and be valid
  2. All required dependencies must be installed
  3. Sufficient system resources must be available
  4. Network connectivity to required services
  5. Appropriate file system permissions

Command Syntax

provisioning-installer --unattended --config <config-file>

Example CI/CD Integrations

GitHub Actions workflow:

name: Deploy Provisioning Platform
on:
  push:
    branches: [main]

jobs:
  deploy:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3

      - name: Install prerequisites
        run: |
          curl -sSL  [https://install.nushell.sh](https://install.nushell.sh) | sh
          curl -sSL  [https://install.nickel-lang.org](https://install.nickel-lang.org) | sh

      - name: Deploy provisioning platform
        env:
          PROVISIONING_DB_PASSWORD: ${{ secrets.DB_PASSWORD }}
          UPCLOUD_API_TOKEN: ${{ secrets.UPCLOUD_TOKEN }}
        run: |
          provisioning-installer --unattended --config ci-config.toml

      - name: Verify deployment
        run: |
          curl -f  [http://localhost:8080/health](http://localhost:8080/health) | | exit 1

Resource Requirements by Mode

Solo Mode

Minimum: 2 CPU, 4GB RAM, 20GB disk Recommended: 4 CPU, 8GB RAM, 50GB disk

MultiUser Mode

Minimum: 4 CPU, 8GB RAM, 50GB disk Recommended: 8 CPU, 16GB RAM, 100GB disk

CICD Mode

Minimum: 8 CPU, 16GB RAM, 100GB disk Recommended: 16 CPU, 32GB RAM, 500GB disk

Enterprise Mode

Minimum: 16 CPU, 32GB RAM, 500GB disk Recommended: 32+ CPU, 64GB+ RAM, 1TB+ disk

Choosing the Right Mode

ScenarioRecommended ModeRationale
First-time installationInteractive TUIGuided setup with validation
Manual production setupInteractive TUIReview all settings before deployment
Ansible playbookHeadless CLIScriptable without GUI
Remote server via SSHHeadless CLIWorks without terminal UI
GitHub ActionsUnattendedZero interaction, strict validation
Docker image buildUnattendedNon-interactive environment

Best Practices

Interactive TUI Mode

  • Review all configuration screens carefully
  • Save configuration for later reuse
  • Document custom settings

Headless CLI Mode

  • Test configuration on development environment first
  • Use --check flag for dry-run validation
  • Store configurations in version control
  • Use environment variables for sensitive data

Unattended Mode

  • Validate configuration files extensively before CI/CD deployment
  • Test rollback behavior in non-production environments
  • Monitor installation logs in real-time
  • Set up alerting for installation failures
  • Use idempotent operations to allow retry

Service Management

Managing the nine core platform services that power the Provisioning infrastructure automation platform.

Platform Services Overview

The platform consists of nine microservices providing execution, management, and supporting infrastructure:

ServicePurposePortLanguageStatus
orchestratorWorkflow execution and task scheduling8080Rust + NushellProduction
control-centerBackend management API with RBAC8081RustProduction
control-center-uiWeb-based management interface8082WebProduction
mcp-serverAI-powered configuration assistance8083NushellActive
ai-serviceMachine learning and anomaly detection8084RustActive
vault-serviceSecrets management and KMS8085RustProduction
extension-registryOCI registry for extensions8086RustPlanned
api-gatewayUnified REST API routing8087RustPlanned
provisioning-daemonBackground service coordination8088RustDevelopment

Service Lifecycle Management

Starting Services

Systemd management (production):

# Start individual service
sudo systemctl start provisioning-orchestrator

# Start all platform services
sudo systemctl start provisioning-*

# Enable automatic start on boot
sudo systemctl enable provisioning-orchestrator
sudo systemctl enable provisioning-control-center
sudo systemctl enable provisioning-vault-service

Manual start (development):

# Orchestrator
cd provisioning/platform/crates/orchestrator
cargo run --release

# Control Center
cd provisioning/platform/crates/control-center
cargo run --release

# MCP Server
cd provisioning/platform/crates/mcp-server
nu run.nu

Stopping Services

# Stop individual service
sudo systemctl stop provisioning-orchestrator

# Stop all platform services
sudo systemctl stop provisioning-*

# Graceful shutdown with 30-second timeout
sudo systemctl stop --timeout 30 provisioning-orchestrator

Restarting Services

# Restart after configuration changes
sudo systemctl restart provisioning-orchestrator

# Reload configuration without restart
sudo systemctl reload provisioning-control-center

Checking Service Status

# Status of all services
systemctl status provisioning-*

# Detailed status
provisioning platform status

# Health check endpoints
curl  [http://localhost:8080/health](http://localhost:8080/health)  # Orchestrator
curl  [http://localhost:8081/health](http://localhost:8081/health)  # Control Center
curl  [http://localhost:8085/health](http://localhost:8085/health)  # Vault Service

Service Configuration

Configuration Files

Each service reads configuration from hierarchical sources:

/etc/provisioning/config.toml           # System defaults
~/.config/provisioning/user_config.yaml # User overrides
workspace/config/provisioning.yaml      # Workspace config

Orchestrator Configuration

# /etc/provisioning/orchestrator.toml
[server]
host = "0.0.0.0"
port = 8080
workers = 8

[storage]
persistence_dir = "/var/lib/provisioning/orchestrator"
checkpoint_interval = 30

[execution]
max_parallel_tasks = 100
retry_attempts = 3
retry_backoff = "exponential"

[api]
enable_rest = true
enable_grpc = false
auth_required = true

Control Center Configuration

# /etc/provisioning/control-center.toml
[server]
host = "0.0.0.0"
port = 8081

[auth]
jwt_algorithm = "RS256"
access_token_ttl = 900
refresh_token_ttl = 604800

[rbac]
policy_dir = "/etc/provisioning/policies"
reload_interval = 60

Vault Service Configuration

# /etc/provisioning/vault-service.toml
[vault]
backend = "secretumvault"
url = " [http://localhost:8200"](http://localhost:8200")
token_env = "VAULT_TOKEN"

[kms]
envelope_encryption = true
key_rotation_days = 90

Service Dependencies

Understanding service dependencies for proper startup order:

Database (SurrealDB)
  ↓
orchestrator (requires database)
  ↓
vault-service (requires orchestrator)
  ↓
control-center (requires orchestrator + vault)
  ↓
control-center-ui (requires control-center)
  ↓
mcp-server (requires control-center)
  ↓
ai-service (requires mcp-server)

Systemd handles dependencies automatically:

# /etc/systemd/system/provisioning-control-center.service
[Unit]
Description=Provisioning Control Center
After=provisioning-orchestrator.service
Requires=provisioning-orchestrator.service

Service Health Monitoring

Health Check Endpoints

All services expose /health endpoints:

# Check orchestrator health
curl  [http://localhost:8080/health](http://localhost:8080/health)

# Expected response
{
  "status": "healthy",
  "version": "5.0.0",
  "uptime_seconds": 3600,
  "database": "connected",
  "active_workflows": 5,
  "queued_tasks": 12
}

Automated Health Monitoring

Use systemd watchdog for automatic restart on failure:

# /etc/systemd/system/provisioning-orchestrator.service
[Service]
WatchdogSec=30
Restart=on-failure
RestartSec=10

Monitor with provisioning CLI:

# Continuous health monitoring
provisioning platform monitor --interval 5

# Alert on unhealthy services
provisioning platform monitor --alert-email [ops@example.com](mailto:ops@example.com)

Log Management

Log Locations

Systemd services log to journald:

# View orchestrator logs
sudo journalctl -u provisioning-orchestrator -f

# View last hour of logs
sudo journalctl -u provisioning-orchestrator --since "1 hour ago"

# View errors only
sudo journalctl -u provisioning-orchestrator -p err

# Export logs to file
sudo journalctl -u provisioning-* > platform-logs.txt

File-based logs:

/var/log/provisioning/orchestrator.log
/var/log/provisioning/control-center.log
/var/log/provisioning/vault-service.log

Log Rotation

Configure logrotate for file-based logs:

# /etc/logrotate.d/provisioning
/var/log/provisioning/*.log {
    daily
    rotate 30
    compress
    delaycompress
    missingok
    notifempty
    create 0644 provisioning provisioning
    sharedscripts
    postrotate
        systemctl reload provisioning-* | | true
    endscript
}

Log Levels

Configure log verbosity:

# Set log level via environment
export PROVISIONING_LOG_LEVEL=debug
sudo systemctl restart provisioning-orchestrator

# Or in configuration
provisioning config set logging.level debug

Log levels: trace, debug, info, warn, error

Performance Tuning

Orchestrator Performance

Adjust worker threads and task limits:

[execution]
max_parallel_tasks = 200  # Increase for high throughput
worker_threads = 16       # Match CPU cores
task_queue_size = 1000

[performance]
enable_metrics = true
metrics_interval = 10

Database Connection Pooling

[database]
max_connections = 100
min_connections = 10
connection_timeout = 30
idle_timeout = 600

Memory Limits

Set memory limits via systemd:

[Service]
MemoryMax=4G
MemoryHigh=3G

Service Updates and Upgrades

Zero-Downtime Upgrades

Rolling upgrade procedure:

# 1. Deploy new version alongside old version
sudo cp provisioning-orchestrator /usr/local/bin/provisioning-orchestrator-new

# 2. Update systemd service to use new binary
sudo systemctl daemon-reload

# 3. Graceful restart
sudo systemctl reload provisioning-orchestrator

Version Management

Check running versions:

provisioning platform versions

# Output:
# orchestrator: 5.0.0
# control-center: 5.0.0
# vault-service: 4.0.0

Rollback Procedure

# 1. Stop new version
sudo systemctl stop provisioning-orchestrator

# 2. Restore previous binary
sudo cp /usr/local/bin/provisioning-orchestrator.backup \
       /usr/local/bin/provisioning-orchestrator

# 3. Start service with previous version
sudo systemctl start provisioning-orchestrator

Security Hardening

Service Isolation

Run services with dedicated users:

# Create service user
sudo useradd -r -s /usr/sbin/nologin provisioning

# Set ownership
sudo chown -R provisioning:provisioning /var/lib/provisioning
sudo chown -R provisioning:provisioning /etc/provisioning

Systemd service configuration:

[Service]
User=provisioning
Group=provisioning
NoNewPrivileges=true
PrivateTmp=true
ProtectSystem=strict
ProtectHome=true

Network Security

Restrict service access with firewall:

# Allow only localhost access
sudo ufw allow from 127.0.0.1 to any port 8080
sudo ufw allow from 127.0.0.1 to any port 8081

# Or use systemd socket activation

Troubleshooting Services

Service Won’t Start

Check service status and logs:

systemctl status provisioning-orchestrator
journalctl -u provisioning-orchestrator -n 100

Common issues:

  • Port already in use: Check with lsof -i :8080
  • Configuration error: Validate with provisioning validate config
  • Missing dependencies: Check with ldd /usr/local/bin/provisioning-orchestrator
  • Permission issues: Verify file ownership

High Resource Usage

Monitor resource consumption:

# CPU and memory usage
systemctl status provisioning-orchestrator

# Detailed metrics
provisioning platform metrics --service orchestrator

Adjust limits:

# Increase memory limit
sudo systemctl set-property provisioning-orchestrator MemoryMax=8G

# Reduce parallel tasks
provisioning config set execution.max_parallel_tasks 50
sudo systemctl restart provisioning-orchestrator

Service Crashes

Enable core dumps for debugging:

# Enable core dumps
sudo sysctl -w kernel.core_pattern=/var/crash/core.%e.%p
ulimit -c unlimited

# Analyze crash
sudo coredumpctl list
sudo coredumpctl debug

Service Metrics

Prometheus Integration

Services expose Prometheus metrics:

# Orchestrator metrics
curl  [http://localhost:8080/metrics](http://localhost:8080/metrics)

# Example metrics:
# provisioning_workflows_total 1234
# provisioning_workflows_active 5
# provisioning_tasks_queued 12
# provisioning_tasks_completed 9876

Grafana Dashboards

Import pre-built dashboards:

provisioning monitoring install-dashboards

Dashboards available at http://localhost:3000

Best Practices

Service Management

  • Use systemd for production deployments
  • Enable automatic restart on failure
  • Monitor health endpoints continuously
  • Set appropriate resource limits
  • Implement log rotation
  • Regular backup of service data

Configuration Management

  • Version control all configuration files
  • Use hierarchical configuration for flexibility
  • Validate configuration before applying
  • Document all custom settings
  • Use environment variables for secrets

Monitoring and Alerting

  • Monitor all service health endpoints
  • Set up alerts for service failures
  • Track key performance metrics
  • Review logs regularly
  • Establish incident response procedures

Monitoring

Comprehensive observability stack for the Provisioning platform using Prometheus, Grafana, and custom metrics.

Monitoring Stack Overview

The platform monitoring system consists of:

ComponentPurposePortStatus
PrometheusMetrics collection and storage9090Production
GrafanaVisualization and dashboards3000Production
LokiLog aggregation3100Active
AlertmanagerAlert routing and notification9093Production
Node ExporterSystem metrics9100Production

Quick Start

Install monitoring stack:

# Install all monitoring components
provisioning monitoring install

# Install specific components
provisioning monitoring install --components prometheus,grafana

# Start monitoring services
provisioning monitoring start

Access dashboards:

Prometheus Configuration

Service Discovery

Prometheus automatically discovers platform services:

# /etc/provisioning/prometheus/prometheus.yml
global:
  scrape_interval: 15s
  evaluation_interval: 15s

scrape_configs:
  - job_name: 'provisioning-orchestrator'
    static_configs:
      - targets: ['localhost:8080']
    metrics_path: '/metrics'

  - job_name: 'provisioning-control-center'
    static_configs:
      - targets: ['localhost:8081']

  - job_name: 'provisioning-vault-service'
    static_configs:
      - targets: ['localhost:8085']

  - job_name: 'node-exporter'
    static_configs:
      - targets: ['localhost:9100']

Retention Configuration

global:
  external_labels:
    cluster: 'provisioning-production'

# Storage retention
storage:
  tsdb:
    retention.time: 30d
    retention.size: 50GB

Key Metrics

Platform Metrics

Orchestrator metrics:

provisioning_workflows_total - Total workflows created
provisioning_workflows_active - Currently active workflows
provisioning_workflows_completed - Successfully completed workflows
provisioning_workflows_failed - Failed workflows
provisioning_tasks_queued - Tasks in queue
provisioning_tasks_running - Currently executing tasks
provisioning_tasks_completed - Total completed tasks
provisioning_checkpoint_recoveries - Checkpoint recovery count

Control Center metrics:

provisioning_api_requests_total - Total API requests
provisioning_api_requests_duration_seconds - Request latency histogram
provisioning_auth_attempts_total - Authentication attempts
provisioning_auth_failures_total - Failed authentication attempts
provisioning_rbac_denials_total - Authorization denials

Vault Service metrics:

provisioning_secrets_operations_total - Secret operations count
provisioning_kms_encryptions_total - Encryption operations
provisioning_kms_decryptions_total - Decryption operations
provisioning_kms_latency_seconds - KMS operation latency

System Metrics

Node Exporter provides system-level metrics:

node_cpu_seconds_total - CPU time per core
node_memory_MemAvailable_bytes - Available memory
node_disk_io_time_seconds_total - Disk I/O time
node_network_receive_bytes_total - Network RX bytes
node_network_transmit_bytes_total - Network TX bytes
node_filesystem_avail_bytes - Available disk space

Grafana Dashboards

Pre-built Dashboards

Import platform dashboards:

# Install all pre-built dashboards
provisioning monitoring install-dashboards

# List available dashboards
provisioning monitoring list-dashboards

Available dashboards:

  1. Platform Overview - High-level system status
  2. Orchestrator Performance - Workflow and task metrics
  3. Control Center API - API request metrics and latency
  4. Vault Service KMS - Encryption operations and performance
  5. System Resources - CPU, memory, disk, network
  6. Security Events - Authentication, authorization, audit logs
  7. Database Performance - SurrealDB metrics

Custom Dashboard Creation

Create custom dashboards via Grafana UI or provisioning:

{
  "dashboard": {
    "title": "Custom Infrastructure Dashboard",
    "panels": [
      {
        "title": "Active Workflows",
        "targets": [
          {
            "expr": "provisioning_workflows_active",
            "legendFormat": "Active Workflows"
          }
        ],
        "type": "graph"
      }
    ]
  }
}

Save dashboard:

provisioning monitoring export-dashboard --id 1 --output custom-dashboard.json

Alerting

Alert Rules

Configure alert rules in Prometheus:

# /etc/provisioning/prometheus/alerts/provisioning.yml
groups:
  - name: provisioning_alerts
    interval: 30s
    rules:
      - alert: OrchestratorDown
        expr: up{job="provisioning-orchestrator"} == 0
        for: 1m
        labels:
          severity: critical
        annotations:
          summary: "Orchestrator service is down"
          description: "Orchestrator has been down for more than 1 minute"

      - alert: HighWorkflowFailureRate
        expr: |
          rate(provisioning_workflows_failed[5m]) /
          rate(provisioning_workflows_total[5m]) > 0.1
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "High workflow failure rate"
          description: "More than 10% of workflows are failing"

      - alert: DatabaseConnectionLoss
        expr: provisioning_database_connected == 0
        for: 30s
        labels:
          severity: critical
        annotations:
          summary: "Database connection lost"

      - alert: HighMemoryUsage
        expr: |
          (1 - node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes) > 0.9
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "High memory usage"
          description: "Memory usage is above 90%"

      - alert: DiskSpaceLow
        expr: |
          (node_filesystem_avail_bytes{mountpoint="/var/lib/provisioning"} /
           node_filesystem_size_bytes{mountpoint="/var/lib/provisioning"}) < 0.1
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "Low disk space"
          description: "Less than 10% disk space available"

Alertmanager Configuration

Route alerts to appropriate channels:

# /etc/provisioning/alertmanager/alertmanager.yml
global:
  resolve_timeout: 5m

route:
  group_by: ['alertname', 'severity']
  group_wait: 10s
  group_interval: 10s
  repeat_interval: 12h
  receiver: 'team-email'

  routes:
    - match:
        severity: critical
      receiver: 'pagerduty'
      continue: true

    - match:
        severity: warning
      receiver: 'slack'

receivers:
  - name: 'team-email'
    email_configs:
      - to: '[ops@example.com](mailto:ops@example.com)'
        from: '[alerts@provisioning.example.com](mailto:alerts@provisioning.example.com)'
        smarthost: 'smtp.example.com:587'

  - name: 'pagerduty'
    pagerduty_configs:
      - service_key: '<pagerduty-key>'

  - name: 'slack'
    slack_configs:
      - api_url: '<slack-webhook-url>'
        channel: '#provisioning-alerts'

Test alerts:

# Send test alert
provisioning monitoring test-alert --severity critical

# Silence alerts temporarily
provisioning monitoring silence --duration 2h --reason "Maintenance window"

Log Aggregation with Loki

Loki Configuration

# /etc/provisioning/loki/loki.yml
auth_enabled: false

server:
  http_listen_port: 3100

ingester:
  lifecycler:
    ring:
      kvstore:
        store: inmemory
      replication_factor: 1

schema_config:
  configs:
    - from: 2024-01-01
      store: boltdb-shipper
      object_store: filesystem
      schema: v11
      index:
        prefix: index_
        period: 24h

storage_config:
  boltdb_shipper:
    active_index_directory: /var/lib/loki/boltdb-shipper-active
    cache_location: /var/lib/loki/boltdb-shipper-cache
  filesystem:
    directory: /var/lib/loki/chunks

limits_config:
  retention_period: 720h  # 30 days

Promtail for Log Shipping

# /etc/provisioning/promtail/promtail.yml
server:
  http_listen_port: 9080

positions:
  filename: /tmp/positions.yaml

clients:
  - url:  [http://localhost:3100/loki/api/v1/push](http://localhost:3100/loki/api/v1/push)

scrape_configs:
  - job_name: system
    static_configs:
      - targets:
          - localhost
        labels:
          job: varlogs
          __path__: /var/log/provisioning/*.log

  - job_name: journald
    journal:
      max_age: 12h
      labels:
        job: systemd-journal
    relabel_configs:
      - source_labels: ['__journal__systemd_unit']
        target_label: 'unit'

Query logs in Grafana:

{job="varlogs"} | = "error"
{unit="provisioning-orchestrator.service"} | = "workflow" | json

Tracing with Tempo

Distributed Tracing

Enable OpenTelemetry tracing in services:

# /etc/provisioning/config.toml
[tracing]
enabled = true
exporter = "otlp"
endpoint = "localhost:4317"
service_name = "provisioning-orchestrator"

Tempo configuration:

# /etc/provisioning/tempo/tempo.yml
server:
  http_listen_port: 3200

distributor:
  receivers:
    otlp:
      protocols:
        grpc:
          endpoint: 0.0.0.0:4317

storage:
  trace:
    backend: local
    local:
      path: /var/lib/tempo/traces

query_frontend:
  search:
    enabled: true

View traces in Grafana or Tempo UI.

Performance Monitoring

Query Performance

Monitor slow queries:

# 95th percentile API latency
histogram_quantile(0.95,
  rate(provisioning_api_requests_duration_seconds_bucket[5m])
)

# Slow workflows (>60s)
provisioning_workflow_duration_seconds > 60

Resource Monitoring

Track resource utilization:

# CPU usage per service
rate(process_cpu_seconds_total{job=~"provisioning-.*"}[5m]) * 100

# Memory usage per service
process_resident_memory_bytes{job=~"provisioning-.*"}

# Disk I/O rate
rate(node_disk_io_time_seconds_total[5m])

Custom Metrics

Adding Custom Metrics

Rust services use prometheus crate:

use prometheus::{Counter, Histogram, Registry};

// Create metrics
let workflow_counter = Counter::new(
    "provisioning_custom_workflows",
    "Custom workflow counter"
)?;

let task_duration = Histogram::with_opts(
    HistogramOpts::new("provisioning_task_duration", "Task duration")
        .buckets(vec![0.1, 0.5, 1.0, 5.0, 10.0])
)?;

// Register metrics
registry.register(Box::new(workflow_counter))?;
registry.register(Box::new(task_duration))?;

// Use metrics
workflow_counter.inc();
task_duration.observe(duration_seconds);

Nushell scripts export metrics:

# Export metrics in Prometheus format
def export-metrics [] {
    [
        "# HELP provisioning_custom_metric Custom metric"
        "# TYPE provisioning_custom_metric counter"
        $"provisioning_custom_metric (get-metric-value)"
    ] | str join "
"
}

Monitoring Best Practices

  • Set appropriate scrape intervals (15-60s)
  • Configure retention based on compliance requirements
  • Use labels for multi-dimensional metrics
  • Create dashboards for key business metrics
  • Set up alerts for critical failures only
  • Document alert thresholds and runbooks
  • Review and tune alerts regularly
  • Use recording rules for expensive queries
  • Archive long-term metrics to object storage

Backup & Recovery

Comprehensive backup strategies and disaster recovery procedures for the Provisioning platform.

Overview

The platform backup strategy covers:

  • Platform service data and state
  • Database backups (SurrealDB)
  • Configuration files and secrets
  • Infrastructure definitions
  • Workflow checkpoints and history
  • Audit logs and compliance data

Backup Components

Critical Data

ComponentLocationBackup PriorityRecovery Time
Database/var/lib/provisioning/databaseCritical< 15 min
Orchestrator State/var/lib/provisioning/orchestratorCritical< 5 min
Configuration/etc/provisioningHigh< 5 min
SecretsSOPS-encrypted filesCritical< 5 min
Audit Logs/var/log/provisioning/auditCompliance< 30 min
Workspace Dataworkspace/High< 15 min
Infrastructure Schemasprovisioning/schemasHigh< 10 min

Backup Strategies

Full Backup

Complete system backup including all components:

# Create full backup
provisioning backup create --type full --output /backups/full-$(date +%Y%m%d).tar.gz

# Full backup includes:
# - Database dump
# - Service configuration
# - Workflow state
# - Audit logs
# - User data

Contents of full backup:

full-20260116.tar.gz
├── database/
│   └── surrealdb-dump.sql
├── config/
│   ├── provisioning.toml
│   ├── orchestrator.toml
│   └── control-center.toml
├── state/
│   ├── workflows/
│   └── checkpoints/
├── logs/
│   └── audit/
├── workspace/
│   ├── infra/
│   └── config/
└── metadata.json

Incremental Backup

Backup only changed data since last backup:

# Incremental backup (faster, smaller)
provisioning backup create --type incremental --since-backup full-20260116

# Incremental backup includes:
# - New workflows since last backup
# - Configuration changes
# - New audit log entries
# - Modified workspace files

Continuous Backup

Real-time backup of critical data:

# Enable continuous backup
provisioning backup enable-continuous --destination s3://backups/continuous

# WAL archiving for database
# Real-time checkpoint backup
# Audit log streaming

Backup Commands

Create Backup

# Full backup to local directory
provisioning backup create --type full --output /backups

# Incremental backup
provisioning backup create --type incremental

# Backup specific components
provisioning backup create --components database,config

# Compressed backup
provisioning backup create --compress gzip

# Encrypted backup
provisioning backup create --encrypt --key-file /etc/provisioning/backup.key

List Backups

# List all backups
provisioning backup list

# Output:
# NAME                  TYPE         SIZE    DATE                STATUS
# full-20260116        Full         2.5GB   2026-01-16 10:00   Complete
# incr-20260116-1200   Incremental  150MB   2026-01-16 12:00   Complete
# full-20260115        Full         2.4GB   2026-01-15 10:00   Complete

Restore Backup

# Restore full backup
provisioning backup restore --backup full-20260116 --confirm

# Restore specific components
provisioning backup restore --backup full-20260116 --components database

# Point-in-time restore
provisioning backup restore --timestamp "2026-01-16 09:30:00"

# Dry-run restore
provisioning backup restore --backup full-20260116 --dry-run

Verify Backup

# Verify backup integrity
provisioning backup verify --backup full-20260116

# Test restore in isolated environment
provisioning backup test-restore --backup full-20260116

Automated Backup Scheduling

Cron-based Backups

# Install backup cron jobs
provisioning backup schedule install

# Default schedule:
# Full backup: Daily at 2 AM
# Incremental: Every 6 hours
# Cleanup old backups: Weekly

Crontab entries:

# Full daily backup
0 2 * * * /usr/local/bin/provisioning backup create --type full --output /backups

# Incremental every 6 hours
0 */6 * * * /usr/local/bin/provisioning backup create --type incremental

# Cleanup backups older than 30 days
0 3 * * 0 /usr/local/bin/provisioning backup cleanup --older-than 30d

Systemd Timer-based Backups

# /etc/systemd/system/provisioning-backup.timer
[Unit]
Description=Provisioning Platform Backup Timer

[Timer]
OnCalendar=daily
OnCalendar=02:00
Persistent=true

[Install]
WantedBy=timers.target
# /etc/systemd/system/provisioning-backup.service
[Unit]
Description=Provisioning Platform Backup

[Service]
Type=oneshot
ExecStart=/usr/local/bin/provisioning backup create --type full
User=provisioning

Enable timer:

sudo systemctl enable provisioning-backup.timer
sudo systemctl start provisioning-backup.timer

Backup Destinations

Local Filesystem

# Backup to local directory
provisioning backup create --output /mnt/backups

Remote Storage

S3-compatible storage:

# Backup to S3
provisioning backup create --destination s3://my-bucket/backups \
  --s3-region us-east-1

# Backup to MinIO
provisioning backup create --destination s3://backups \
  --s3-endpoint  [http://minio.local:9000](http://minio.local:9000)

Network filesystem:

# Backup to NFS mount
provisioning backup create --output /mnt/nfs/backups

# Backup to SMB share
provisioning backup create --output /mnt/smb/backups

Off-site Backup

Rsync to remote server:

# Backup and sync to remote
provisioning backup create --output /backups
rsync -avz /backups/ backup-server:/backups/provisioning/

Database Backup

SurrealDB Backup

# Export database
surreal export --conn  [http://localhost:8000](http://localhost:8000) \
  --user root --pass root \
  --ns provisioning --db main \
  /backups/database-$(date +%Y%m%d).surql

# Import database
surreal import --conn  [http://localhost:8000](http://localhost:8000) \
  --user root --pass root \
  --ns provisioning --db main \
  /backups/database-20260116.surql

Automated Database Backups

# Enable automatic database backups
provisioning backup database enable --interval daily

# Backup with point-in-time recovery
provisioning backup database create --enable-pitr

Disaster Recovery

Recovery Procedures

Complete platform recovery from backup:

# 1. Stop all services
sudo systemctl stop provisioning-*

# 2. Restore database
provisioning backup restore --backup full-20260116 --components database

# 3. Restore configuration
provisioning backup restore --backup full-20260116 --components config

# 4. Restore service state
provisioning backup restore --backup full-20260116 --components state

# 5. Verify data integrity
provisioning validate-installation

# 6. Start services
sudo systemctl start provisioning-*

# 7. Verify services
provisioning platform status

Recovery Time Objectives

ScenarioRTORPOProcedure
Service failure5 min0Restart service from checkpoint
Database corruption15 min6 hoursRestore from incremental backup
Complete data loss30 min24 hoursRestore from full backup
Site disaster2 hours24 hoursRestore from off-site backup

Point-in-Time Recovery

Restore to specific timestamp:

# List available recovery points
provisioning backup list-recovery-points

# Restore to specific time
provisioning backup restore --timestamp "2026-01-16 09:30:00"

# Recovery with workflow replay
provisioning backup restore --timestamp "2026-01-16 09:30:00" --replay-workflows

Backup Encryption

SOPS Encryption

Encrypt backups with SOPS:

# Create encrypted backup
provisioning backup create --encrypt sops --key-file /etc/provisioning/age.key

# Restore encrypted backup
provisioning backup restore --backup encrypted-20260116.tar.gz.enc \
  --decrypt sops --key-file /etc/provisioning/age.key

Age Encryption

# Generate age key pair
age-keygen -o /etc/provisioning/backup-key.txt

# Create encrypted backup with age
provisioning backup create --encrypt age --recipient "age1..."

# Decrypt and restore
age -d -i /etc/provisioning/backup-key.txt backup.tar.gz.age | \
  provisioning backup restore --stdin

Backup Retention

Retention Policies

# /etc/provisioning/backup-retention.toml
[retention]
# Keep daily backups for 7 days
daily = 7

# Keep weekly backups for 4 weeks
weekly = 4

# Keep monthly backups for 12 months
monthly = 12

# Keep yearly backups for 7 years (compliance)
yearly = 7

Apply retention policy:

# Cleanup old backups according to policy
provisioning backup cleanup --policy /etc/provisioning/backup-retention.toml

Backup Monitoring

Backup Alerts

Configure alerts for backup failures:

# Prometheus alert for failed backups
- alert: BackupFailed
  expr: provisioning_backup_status{status="failed"} > 0
  for: 5m
  labels:
    severity: critical
  annotations:
    summary: "Backup failed"
    description: "Backup has failed, investigate immediately"

Backup Metrics

Monitor backup health:

# Backup success rate
provisioning_backup_success_rate{type="full"} 1.0

# Time since last backup
time() - provisioning_backup_last_success_timestamp > 86400

# Backup size trend
increase(provisioning_backup_size_bytes[7d])

Testing Recovery Procedures

Regular DR Drills

# Automated disaster recovery test
provisioning backup test-recovery --backup full-20260116 \
  --test-environment isolated

# Steps performed:
# 1. Spin up isolated test environment
# 2. Restore backup
# 3. Verify data integrity
# 4. Run smoke tests
# 5. Generate test report
# 6. Teardown test environment

Schedule monthly DR tests:

# Monthly disaster recovery drill
0 4 1 * * /usr/local/bin/provisioning backup test-recovery --latest

Best Practices

  • Implement 3-2-1 backup rule: 3 copies, 2 different media, 1 off-site
  • Encrypt all backups containing sensitive data
  • Test restore procedures regularly (monthly minimum)
  • Monitor backup success/failure metrics
  • Automate backup verification
  • Document recovery procedures and RTO/RPO
  • Maintain off-site backups for disaster recovery
  • Use incremental backups to reduce storage costs
  • Version control infrastructure schemas separately
  • Retain audit logs per compliance requirements (7 years)

Upgrading Provisioning

Upgrade Provisioning to a new version with minimal downtime and automatic rollback support.

Overview

Provisioning supports two upgrade strategies:

  1. In-Place Upgrade - Update existing installation
  2. Side-by-Side Upgrade - Run new version alongside old, switch when ready

Both strategies support automatic rollback on failure.

Before Upgrading

Check Current Version

provisioning version

# Example output:
# Provisioning v5.0.0
# Nushell 0.109.0
# Nickel 1.15.1
# SOPS 3.10.2
# Age 1.2.1

Backup Configuration

# Backup entire workspace
provisioning workspace backup

# Backup specific configuration
provisioning config backup

# Backup state
provisioning state backup

Check Changelog

# View latest changes
provisioning changelog

# Check upgrade path
provisioning version --check-upgrade

# Show upgrade recommendations
provisioning upgrade --check

Verify System Health

# Health check
provisioning health check

# Check all services
provisioning platform health

# Verify provider connectivity
provisioning providers test --all

# Validate configuration
provisioning validate config --strict

Upgrade Methods

Method 1: In-Place Upgrade

Upgrade the existing installation with zero downtime:

# Check upgrade compatibility
provisioning upgrade --check

# List breaking changes
provisioning upgrade --breaking-changes

# Show migration guide (if any)
provisioning upgrade --show-migration

# Perform upgrade
provisioning upgrade

Process:

  1. Validate current installation
  2. Download new version
  3. Run migration scripts (if needed)
  4. Restart services
  5. Verify health
  6. Keep old version for rollback (24 hours)

Method 2: Side-by-Side Upgrade

Run new version alongside old version for testing:

# Create staging installation
provisioning upgrade --staging --version v5.1.0

# Test new version
provisioning --staging server list

# Run test suite
provisioning --staging test suite

# Switch to new version
provisioning upgrade --activate

# Remove old version (after confirmation)
provisioning upgrade --cleanup-old

Advantages:

  • Test new version before switching
  • Zero downtime during upgrade
  • Easy rollback to previous version
  • Run both versions simultaneously

Upgrade Process

Step 1: Pre-Upgrade Checks

# Check system requirements
provisioning setup validate

# Verify dependencies are up-to-date
provisioning version --check-dependencies

# Check disk space (minimum 2GB required)
df -h /

# Verify all services healthy
provisioning platform health

Step 2: Backup Data

# Backup entire workspace
provisioning workspace backup --compress

# Backup orchestrator state
provisioning orchestrator backup

# Backup configuration
provisioning config backup

# Verify backup
provisioning backup list
provisioning backup verify --latest

Step 3: Download New Version

# Check available versions
provisioning version --available

# Download specific version
provisioning upgrade --download v5.1.0

# Verify download
provisioning upgrade --verify-download v5.1.0

# Check size
provisioning upgrade --show-size v5.1.0

Step 4: Run Migration Scripts

# Show required migrations
provisioning upgrade --show-migrations

# Test migration (dry-run)
provisioning upgrade --dry-run

# Run migrations
provisioning upgrade --migrate

# Verify migration
provisioning upgrade --verify-migration

Step 5: Perform Upgrade

# Stop orchestrator gracefully
provisioning orchestrator stop --graceful

# Install new version
provisioning upgrade --install

# Verify installation
provisioning version
provisioning validate config

# Start services
provisioning orchestrator start

Step 6: Verify Upgrade

# Check version
provisioning version

# Health check
provisioning health check

# Run test suite
provisioning test quick

# Verify provider connectivity
provisioning providers test --all

# Check orchestrator status
provisioning orchestrator status

Breaking Changes

Some upgrades may include breaking changes. Check before upgrading:

# List breaking changes
provisioning upgrade --breaking-changes

# Show migration guide
provisioning upgrade --migration-guide v5.1.0

# Generate migration script
provisioning upgrade --generate-migration v5.1.0 > migrate.nu

Common Migration Scenarios

Scenario 1: Configuration Format Change

If configuration format changes (e.g., TOML → YAML):

# Export old format
provisioning config export --format toml > config.old.toml

# Run migration
provisioning upgrade --migrate-config

# Verify new format
provisioning config export --format yaml | head -20

Scenario 2: Schema Updates

If infrastructure schemas change:

# Validate against new schema
nickel typecheck workspace/infra/*.ncl

# Update schemas if needed
provisioning upgrade --update-schemas

# Regenerate configurations
provisioning config regenerate

# Validate updated config
provisioning validate config --strict

Scenario 3: Provider API Changes

If provider APIs change:

# Test provider connectivity with new version
provisioning providers test upcloud --verbose

# Check provider configuration
provisioning config show --section providers.upcloud

# Update provider configuration if needed
provisioning providers configure upcloud

# Verify connectivity
provisioning server list

Rollback Procedure

Automatic Rollback

If upgrade fails, automatic rollback occurs:

# Monitor rollback progress
provisioning upgrade --watch

# Check rollback status
provisioning upgrade --status

# View rollback logs
provisioning upgrade --logs

Manual Rollback

If needed, manually rollback to previous version:

# List available versions for rollback
provisioning upgrade --rollback-candidates

# Rollback to specific version
provisioning upgrade --rollback v5.0.0

# Verify rollback
provisioning version
provisioning platform health

# Restore from backup
provisioning backup restore --backup-id=<id>

Batch Workflow Handling

If you have running batch workflows:

# Check running workflows
provisioning workflow list --status running

# Graceful shutdown (wait for completion)
provisioning workflow shutdown --graceful

# Force shutdown (immediate)
provisioning workflow shutdown --force

# Resume workflows after upgrade
provisioning workflow resume

Troubleshooting Upgrades

Upgrade Hangs

# Check logs
tail -f ~/.provisioning/logs/upgrade.log

# Monitor process
provisioning upgrade --monitor

# Stop upgrade gracefully
provisioning upgrade --stop --graceful

# Force stop
provisioning upgrade --stop --force

Migration Failure

# Check migration logs
provisioning upgrade --migration-logs

# Rollback to previous version
provisioning upgrade --rollback

# Restore from backup
provisioning backup restore

Service Won’t Start

# Check service logs
provisioning platform logs

# Verify configuration
provisioning validate config --strict

# Restore configuration from backup
provisioning config restore

# Restart services
provisioning orchestrator start

Upgrade Scheduling

Schedule Automated Upgrade

# Schedule upgrade for specific time
provisioning upgrade --schedule "2026-01-20T02:00:00"

# Schedule for next maintenance window
provisioning upgrade --schedule-next-maintenance

# Cancel scheduled upgrade
provisioning upgrade --cancel-scheduled

Unattended Upgrade

For CI/CD environments:

# Non-interactive upgrade
provisioning upgrade --yes --no-confirm

# Upgrade with timeout
provisioning upgrade --timeout 3600

# Skip backup
provisioning upgrade --skip-backup

# Continue even if health checks fail
provisioning upgrade --force-upgrade

Version Management

Version Constraints

Pin versions for workspace reproducibility:

# workspace/versions.ncl
{
  provisioning = "5.0.0"
  nushell = "0.109.0"
  nickel = "1.15.1"
  sops = "3.10.2"
  age = "1.2.1"
}

Enforce version constraints:

# Check version compliance
provisioning version --check-constraints

# Enforce constraint
provisioning version --strict-mode

Vendor Versions

Pin provider and task service versions:

# workspace/infra/versions.ncl
{
  providers = {
    upcloud = "2.0.0"
    aws = "5.0.0"
  }
  taskservs = {
    kubernetes = "1.28.0"
    postgres = "14.0"
  }
}

Best Practices

1. Plan Upgrades

  • Schedule during maintenance windows
  • Test in staging first
  • Communicate with team
  • Have rollback plan ready

2. Backup Everything

# Complete backup before upgrade
provisioning workspace backup --compress
provisioning config backup
provisioning state backup

3. Test Before Upgrading

# Use side-by-side upgrade to test
provisioning upgrade --staging
provisioning test suite

4. Monitor After Upgrade

# Watch orchestrator
provisioning orchestrator status --watch

# Monitor platform health
provisioning platform monitor

# Check logs
tail -f ~/.provisioning/logs/provisioning.log

5. Document Changes

# Record what changed
provisioning upgrade --changelog > UPGRADE.md

# Update team documentation
# Update runbooks
# Update dashboards

Upgrade Policies

Automatic Updates

Enable automatic updates:

# ~/.config/provisioning/user_config.yaml
upgrade:
  auto_update: true
  check_interval: "daily"
  update_channel: "stable"
  auto_backup: true

Update Channels

Choose update channel:

# Stable releases (recommended)
provisioning upgrade --channel stable

# Beta releases
provisioning upgrade --channel beta

# Development (nightly)
provisioning upgrade --channel development

Troubleshooting

Common issues, debugging procedures, and resolution strategies for the Provisioning platform.

Quick Diagnosis

Run platform diagnostics:

# Comprehensive health check
provisioning diagnose

# Check specific component
provisioning diagnose --component orchestrator

# Generate diagnostic report
provisioning diagnose --report /tmp/diagnostics.txt

Common Issues

Services Won’t Start

Symptom: Service fails to start or crashes immediately

Diagnosis:

# Check service status
systemctl status provisioning-orchestrator

# View recent logs
journalctl -u provisioning-orchestrator -n 100 --no-pager

# Check configuration
provisioning validate config

Common Causes:

  1. Port already in use
# Find process using port
lsof -i :8080

# Kill conflicting process or change port in config
  1. Configuration error
# Validate configuration
provisioning validate config --strict

# Check for syntax errors
nickel typecheck /etc/provisioning/config.ncl
  1. Missing dependencies
# Check binary dependencies
ldd /usr/local/bin/provisioning-orchestrator

# Install missing libraries
sudo apt install <missing-library>
  1. Permission issues
# Fix ownership
sudo chown -R provisioning:provisioning /var/lib/provisioning
sudo chown -R provisioning:provisioning /etc/provisioning

# Fix permissions
sudo chmod 750 /var/lib/provisioning
sudo chmod 640 /etc/provisioning/*.toml

Database Connection Failures

Symptom: Services can’t connect to SurrealDB

Diagnosis:

# Check database status
systemctl status surrealdb

# Test database connectivity
curl  [http://localhost:8000/health](http://localhost:8000/health)

# Check database logs
journalctl -u surrealdb -n 50

Resolution:

# Restart database
sudo systemctl restart surrealdb

# Verify connection string in config
provisioning config get database.url

# Test manual connection
surreal sql --conn  [http://localhost:8000](http://localhost:8000) --user root --pass root

High Resource Usage

Symptom: Service consuming excessive CPU or memory

Diagnosis:

# Monitor resource usage
top -p $(pgrep provisioning-orchestrator)

# Detailed metrics
provisioning platform metrics --service orchestrator

# Check for resource leaks

Resolution:

# Adjust worker threads
provisioning config set execution.worker_threads 4

# Reduce parallel tasks
provisioning config set execution.max_parallel_tasks 50

# Increase memory limit
sudo systemctl set-property provisioning-orchestrator MemoryMax=8G

# Restart service
sudo systemctl restart provisioning-orchestrator

Workflow Failures

Symptom: Workflows fail or hang

Diagnosis:

# List failed workflows
provisioning workflow list --status failed

# View workflow details
provisioning workflow show <workflow-id>

# Check workflow logs
provisioning workflow logs <workflow-id>

# Inspect checkpoint state
provisioning workflow checkpoints <workflow-id>

Common Issues:

  1. Provider API errors
# Check provider credentials
provisioning provider validate upcloud

# Test provider connectivity
provisioning provider test upcloud
  1. Dependency resolution failures
# Validate infrastructure schema
provisioning validate infra my-cluster.ncl

# Check task service dependencies
provisioning taskserv deps kubernetes
  1. Timeout issues
# Increase timeout
provisioning config set workflows.task_timeout 600

# Enable detailed logging
provisioning config set logging.level debug

Network Connectivity Issues

Symptom: Can’t reach external services or cloud providers

Diagnosis:

# Test network connectivity
ping -c 3 upcloud.com

# Check DNS resolution
nslookup api.upcloud.com

# Test HTTPS connectivity
curl -v  [https://api.upcloud.com](https://api.upcloud.com)

# Check proxy settings
env | grep -i proxy

Resolution:

# Configure proxy if needed
export HTTPS_PROXY= [http://proxy.example.com:8080](http://proxy.example.com:8080)
provisioning config set network.proxy  [http://proxy.example.com:8080](http://proxy.example.com:8080)

# Verify firewall rules
sudo ufw status

# Check routing
ip route show

Authentication Failures

Symptom: API requests fail with 401 Unauthorized

Diagnosis:

# Check JWT token
provisioning auth status

# Verify user credentials
provisioning auth whoami

# Check authentication logs
journalctl -u provisioning-control-center | grep "auth"

Resolution:

# Refresh authentication token
provisioning auth login --username admin

# Reset user password
provisioning auth reset-password --username admin

# Verify MFA configuration
provisioning auth mfa status

Debugging Workflows

Enable Debug Logging

# Enable debug mode
export PROVISIONING_LOG_LEVEL=debug
provisioning workflow create my-cluster --debug

# Or in configuration
provisioning config set logging.level debug
sudo systemctl restart provisioning-orchestrator

Workflow State Inspection

# View workflow state
provisioning workflow state <workflow-id>

# Export workflow state to JSON
provisioning workflow state <workflow-id> --format json > workflow-state.json

# Inspect checkpoints
provisioning workflow checkpoints <workflow-id>

Manual Workflow Retry

# Retry failed workflow from last checkpoint
provisioning workflow retry <workflow-id>

# Retry from specific checkpoint
provisioning workflow retry <workflow-id> --from-checkpoint 3

# Force retry (skip validation)
provisioning workflow retry <workflow-id> --force

Performance Troubleshooting

Slow Workflow Execution

Diagnosis:

# Profile workflow execution
provisioning workflow profile <workflow-id>

# Identify bottlenecks
provisioning workflow analyze <workflow-id>

Optimization:

# Increase parallelism
provisioning config set execution.max_parallel_tasks 200

# Optimize database queries
provisioning database analyze

# Add caching
provisioning config set cache.enabled true

Database Performance Issues

Diagnosis:

# Check database metrics
curl  [http://localhost:8000/metrics](http://localhost:8000/metrics)

# Identify slow queries
provisioning database slow-queries

# Check connection pool
provisioning database pool-status

Optimization:

# Increase connection pool
provisioning config set database.max_connections 200

# Add indexes
provisioning database create-indexes

# Optimize vacuum settings
provisioning database vacuum

Log Analysis

Centralized Log Viewing

# View all platform logs
journalctl -u provisioning-* -f

# Filter by severity
journalctl -u provisioning-* -p err

# Export logs for analysis
journalctl -u provisioning-* --since "1 hour ago" > /tmp/logs.txt

Structured Log Queries

Using Loki with LogQL:

# Find errors in orchestrator
{job="provisioning-orchestrator"} | = "ERROR"

# Workflow failures
{job="provisioning-orchestrator"} | json | status="failed"

# API request latency over 1s
{job="provisioning-control-center"} | json | duration > 1

Log Correlation

# Correlate logs by request ID
journalctl -u provisioning-* | grep "request_id=abc123"

# Trace workflow execution
provisioning workflow trace <workflow-id>

Advanced Debugging

Enable Rust Backtrace

# Enable backtrace for Rust services
export RUST_BACKTRACE=1
sudo systemctl restart provisioning-orchestrator

# Full backtrace
export RUST_BACKTRACE=full

Core Dump Analysis

# Enable core dumps
sudo sysctl -w kernel.core_pattern=/var/crash/core.%e.%p
ulimit -c unlimited

# Analyze core dump
sudo coredumpctl list
sudo coredumpctl debug <pid>

# In gdb:
(gdb) bt
(gdb) info threads
(gdb) thread apply all bt

Network Traffic Analysis

# Capture API traffic
sudo tcpdump -i any -w /tmp/api-traffic.pcap port 8080

# Analyze with tshark
tshark -r /tmp/api-traffic.pcap -Y "http"

Getting Help

Collect Diagnostic Information

# Generate comprehensive diagnostic report
provisioning diagnose --full --output /tmp/diagnostics.tar.gz

# Report includes:
# - Service status
# - Configuration files
# - Recent logs (last 1000 lines per service)
# - Resource usage metrics
# - Database status
# - Network connectivity tests
# - Workflow states

Support Channels

  1. Check documentation: provisioning help <topic>
  2. Search logs: journalctl -u provisioning-*
  3. Review monitoring dashboards: http://localhost:3000
  4. Run diagnostics: provisioning diagnose
  5. Contact support with diagnostic report

Preventive Measures

  • Enable comprehensive monitoring and alerting
  • Implement regular health checks
  • Maintain up-to-date documentation
  • Test disaster recovery procedures monthly
  • Keep platform and dependencies updated
  • Review logs regularly for warning signs
  • Monitor resource utilization trends
  • Validate configuration changes before applying

Platform Health

Health monitoring, status checks, and system integrity validation for the Provisioning platform.

Health Check Overview

The platform provides multiple levels of health monitoring:

LevelScopeFrequencyResponse Time
Service HealthIndividual service statusEvery 10s< 100ms
System HealthOverall platform statusEvery 30s< 500ms
Infrastructure HealthManaged resourcesEvery 60s< 2s
Dependency HealthExternal servicesEvery 60s< 1s

Quick Health Check

# Check overall platform health
provisioning health

# Output:
# ✓ Orchestrator: healthy (uptime: 5d 3h)
# ✓ Control Center: healthy
# ✓ Vault Service: healthy
# ✓ Database: healthy (connections: 45/100)
# ✓ Network: healthy
# ✗ MCP Server: degraded (high latency)

# Exit code: 0 = healthy, 1 = degraded, 2 = unhealthy

Service Health Endpoints

All services expose /health endpoints returning standardized responses.

Orchestrator Health

curl  [http://localhost:8080/health](http://localhost:8080/health)
{
  "status": "healthy",
  "version": "5.0.0",
  "uptime_seconds": 432000,
  "checks": {
    "database": "healthy",
    "file_system": "healthy",
    "memory": "healthy"
  },
  "metrics": {
    "active_workflows": 12,
    "queued_tasks": 45,
    "completed_tasks": 9876,
    "worker_threads": 8
  },
  "timestamp": "2026-01-16T10:30:00Z"
}

Health status values:

  • healthy - Service operating normally
  • degraded - Service functional with reduced capacity
  • unhealthy - Service not functioning

Control Center Health

curl  [http://localhost:8081/health](http://localhost:8081/health)
{
  "status": "healthy",
  "version": "5.0.0",
  "checks": {
    "database": "healthy",
    "orchestrator": "healthy",
    "vault": "healthy",
    "auth": "healthy"
  },
  "metrics": {
    "active_sessions": 23,
    "api_requests_per_second": 156,
    "p95_latency_ms": 45
  }
}

Vault Service Health

curl  [http://localhost:8085/health](http://localhost:8085/health)
{
  "status": "healthy",
  "checks": {
    "kms_backend": "healthy",
    "encryption": "healthy",
    "key_rotation": "healthy"
  },
  "metrics": {
    "active_secrets": 234,
    "encryption_ops_per_second": 50,
    "kms_latency_ms": 3
  }
}

System Health Checks

Comprehensive Health Check

# Run all health checks
provisioning health check --all

# Check specific components
provisioning health check --components orchestrator,database,network

# Output detailed report
provisioning health check --detailed --output /tmp/health-report.json

Health Check Components

Platform health checking verifies:

  1. Service Availability - All services responding
  2. Database Connectivity - SurrealDB reachable and responsive
  3. Filesystem Health - Disk space and I/O performance
  4. Network Connectivity - Internal and external connectivity
  5. Resource Utilization - CPU, memory, disk within limits
  6. Dependency Status - External services available
  7. Security Status - Authentication and encryption functional

Database Health

# Check database health
provisioning health database

# Output:
# ✓ Connection: healthy (latency: 2ms)
# ✓ Disk usage: 45% (22GB / 50GB)
# ✓ Active connections: 45 / 100
# ✓ Query performance: healthy (avg: 15ms)
# ✗ Replication: warning (lag: 5s)

Detailed database metrics:

# Connection pool status
provisioning database pool-status

# Slow query analysis
provisioning database slow-queries --threshold 1000ms

# Storage usage
provisioning database storage-stats

Filesystem Health

# Check disk space and I/O
provisioning health filesystem

# Output:
# ✓ Root filesystem: 65% used (325GB / 500GB)
# ✓ Data filesystem: 45% used (225GB / 500GB)
# ✓ I/O latency: healthy (avg: 5ms)
# ✗ Inodes: warning (85% used)

Check specific paths:

# Check data directory
df -h /var/lib/provisioning

# Check I/O performance
iostat -x 1 5

Network Health

# Check network connectivity
provisioning health network

# Test external connectivity
provisioning health network --external

# Test provider connectivity
provisioning health network --provider upcloud

Network health checks:

  • Internal service-to-service connectivity
  • DNS resolution
  • External API reachability (cloud providers)
  • Network latency and packet loss
  • Firewall rules validation

Resource Monitoring

CPU Health

# Check CPU utilization
provisioning health cpu

# Per-service CPU usage
provisioning platform metrics --metric cpu_usage

# Alert if CPU > 90% for 5 minutes

Monitor CPU load:

# System load average
uptime

# Per-process CPU
top -b -n 1 | grep provisioning

Memory Health

# Check memory utilization
provisioning health memory

# Memory breakdown by service
provisioning platform metrics --metric memory_usage

# Detect memory leaks
provisioning health memory --leak-detection

Memory metrics:

# Available memory
free -h

# Per-service memory
ps aux | grep provisioning | awk '{sum+=$6} END {print sum/1024 " MB"}'

Disk Health

# Check disk health
provisioning health disk

# SMART status (if available)
sudo smartctl -H /dev/sda

Automated Health Monitoring

Health Check Service

Enable continuous health monitoring:

# Start health monitor
provisioning health monitor --interval 30

# Monitor with alerts
provisioning health monitor --interval 30 --alert-email [ops@example.com](mailto:ops@example.com)

# Monitor specific components
provisioning health monitor --components orchestrator,database --interval 10

Systemd Health Monitoring

Systemd watchdog for automatic restart on failure:

# /etc/systemd/system/provisioning-orchestrator.service
[Service]
Type=notify
WatchdogSec=30
Restart=on-failure
RestartSec=10
StartLimitIntervalSec=300
StartLimitBurst=5

Service sends periodic health status:

// Rust service code
sd_notify::notify(true, &[NotifyState::Watchdog])?;

Health Dashboards

Grafana Health Dashboard

Import platform health dashboard:

provisioning monitoring install-dashboard --name platform-health

Dashboard panels:

  • Service status indicators
  • Resource utilization gauges
  • Error rate graphs
  • Latency histograms
  • Workflow success rate
  • Database connection pool

Access: http://localhost:3000/d/platform-health

CLI Health Dashboard

Real-time health monitoring in terminal:

# Interactive health dashboard
provisioning health dashboard

# Auto-refresh every 5 seconds
provisioning health dashboard --refresh 5

Health Alerts

Prometheus Alert Rules

# Platform health alerts
groups:
  - name: platform_health
    rules:
      - alert: ServiceUnhealthy
        expr: up{job=~"provisioning-.*"} == 0
        for: 1m
        labels:
          severity: critical
        annotations:
          summary: "Service is unhealthy"

      - alert: HighMemoryUsage
        expr: process_resident_memory_bytes > 4e9
        for: 5m
        labels:
          severity: warning

      - alert: DatabaseConnectionPoolExhausted
        expr: database_connection_pool_active / database_connection_pool_max > 0.9
        for: 2m
        labels:
          severity: critical

Health Check Notifications

Configure health check notifications:

# /etc/provisioning/health.toml
[notifications]
enabled = true

[notifications.email]
enabled = true
smtp_server = "smtp.example.com"
from = "[health@provisioning.example.com](mailto:health@provisioning.example.com)"
to = ["[ops@example.com](mailto:ops@example.com)"]

[notifications.slack]
enabled = true
webhook_url = " [https://hooks.slack.com/services/..."](https://hooks.slack.com/services/...")
channel = "#provisioning-health"

[notifications.pagerduty]
enabled = true
service_key = "..."

Dependency Health

External Service Health

Check health of dependencies:

# Check cloud provider API
provisioning health dependency upcloud

# Check vault service
provisioning health dependency vault

# Check all dependencies
provisioning health dependency --all

Dependency health includes:

  • API reachability
  • Authentication validity
  • API quota/rate limits
  • Service degradation status

Third-party Service Monitoring

Monitor integrated services:

# Kubernetes cluster health (if managing K8s)
provisioning health kubernetes

# Database replication health
provisioning health database --replication

# Secret store health
provisioning health secrets

Health Metrics

Key metrics tracked for health monitoring:

Service Metrics

provisioning_service_up{service="orchestrator"} 1
provisioning_service_health_status{service="orchestrator"} 1
provisioning_service_uptime_seconds{service="orchestrator"} 432000

Resource Metrics

provisioning_cpu_usage_percent 45
provisioning_memory_usage_bytes 2.5e9
provisioning_disk_usage_percent{mount="/var/lib/provisioning"} 45
provisioning_network_errors_total 0

Performance Metrics

provisioning_api_latency_p50_ms 25
provisioning_api_latency_p95_ms 85
provisioning_api_latency_p99_ms 150
provisioning_workflow_duration_seconds 45

Health Best Practices

  • Monitor all critical services continuously
  • Set appropriate alert thresholds
  • Test alert notifications regularly
  • Maintain health check runbooks
  • Review health metrics weekly
  • Establish health baselines
  • Automate remediation where possible
  • Document health status definitions
  • Integrate health checks with CI/CD
  • Monitor upstream dependencies

Troubleshooting Unhealthy State

When health check fails:

# 1. Identify unhealthy component
provisioning health check --detailed

# 2. View component logs
journalctl -u provisioning-<component> -n 100

# 3. Check resource availability
provisioning health resources

# 4. Restart unhealthy service
sudo systemctl restart provisioning-<component>

# 5. Verify recovery
provisioning health check

# 6. Review recent changes
git log --since="1 day ago" -- /etc/provisioning/

Provisioning Logo

Provisioning

Security System

Enterprise-grade security infrastructure with 12 integrated components providing authentication, authorization, encryption, and compliance.

Overview

The Provisioning platform security system delivers comprehensive protection across all layers of the infrastructure automation platform. Built for enterprise deployments, it provides defense-in-depth through multiple security controls working together.

Security Architecture

The security system is organized into 12 core components:

ComponentPurposeKey Features
AuthenticationUser identity verificationJWT tokens, session management, multi-provider auth
AuthorizationAccess control enforcementCedar policy engine, RBAC, fine-grained permissions
MFAMulti-factor authenticationTOTP, WebAuthn/FIDO2, backup codes
Audit LoggingComprehensive audit trails7-year retention, 5 export formats, compliance reporting
KMSKey management5 KMS backends, envelope encryption, key rotation
Secrets ManagementSecure secret storageSecretumVault integration, SOPS/Age, dynamic secrets
EncryptionData protectionAt-rest and in-transit encryption, AES-256-GCM
Secure CommunicationNetwork securityTLS/mTLS, certificate management, secure channels
Certificate ManagementPKI operationsCA management, certificate issuance, rotation
ComplianceRegulatory adherenceSOC2, GDPR, HIPAA, policy enforcement
Security TestingValidation framework350+ tests, vulnerability scanning, penetration testing
Break-GlassEmergency accessMulti-party approval, audit trails, time-limited access

Security Layers

Layer 1: Identity and Access

Authentication Flow JWT OAuth MFA Token Refresh Session

  • Authentication: Verify user identity with JWT tokens and Argon2id password hashing

Authorization Cedar Policy Engine RBAC Permit Deny Evaluation

  • Authorization: Enforce access control with Cedar policies and RBAC
  • MFA: Add second factor with TOTP or FIDO2 hardware keys

Layer 2: Data Protection

Encryption Layers At-Rest In-Transit Post-Quantum Secrets Management

  • Encryption: Protect data at rest with AES-256-GCM and in transit with TLS 1.3
  • Secrets Management: Store secrets securely in SecretumVault with automatic rotation
  • KMS: Manage encryption keys with envelope encryption across 5 backend options

Layer 3: Network Security

  • Secure Communication: Enforce TLS/mTLS for all service-to-service communication
  • Certificate Management: Automate certificate lifecycle with cert-manager integration
  • Network Policies: Control traffic flow with Kubernetes NetworkPolicies

Layer 4: Compliance and Monitoring

  • Audit Logging: Record all security events with 7-year retention
  • Compliance: Validate against SOC2, GDPR, and HIPAA frameworks
  • Security Testing: Continuous validation with automated security test suite

Performance Characteristics

  • Authentication Overhead: Less than 20ms per request with JWT verification
  • Authorization Decision: Less than 10ms with Cedar policy evaluation
  • Encryption Operations: Less than 5ms with KMS-backed envelope encryption
  • Audit Logging: Asynchronous with zero blocking on critical path
  • MFA Verification: Less than 100ms for TOTP, less than 500ms for WebAuthn

Security Standards

The security system adheres to industry standards and best practices:

  • OWASP Top 10: Protection against common web vulnerabilities
  • NIST Cybersecurity Framework: Aligned with identify, protect, detect, respond, recover
  • Zero Trust Architecture: Never trust, always verify principle
  • Defense in Depth: Multiple layers of security controls
  • Least Privilege: Minimal access rights for users and services
  • Secure by Default: Security controls enabled out of the box

Component Integration

All security components work together as a cohesive system:

┌─────────────────────────────────────────────────────────────┐
│                    User Request                             │
└──────────────────────┬──────────────────────────────────────┘
                       │
                       ▼
┌─────────────────────────────────────────────────────────────┐
│  Authentication (JWT + Session)                             │
│  ↓                                                           │
│  Authorization (Cedar Policies)                             │
│  ↓                                                           │
│  MFA Verification (if required)                             │
└──────────────────────┬──────────────────────────────────────┘
                       │
                       ▼
┌─────────────────────────────────────────────────────────────┐
│  Audit Logging (Record all actions)                         │
└──────────────────────┬──────────────────────────────────────┘
                       │
                       ▼
┌─────────────────────────────────────────────────────────────┐
│  Secure Communication (TLS/mTLS)                            │
│  ↓                                                           │
│  Data Access (Encrypted with KMS)                           │
│  ↓                                                           │
│  Secrets Retrieved (SecretumVault)                          │
└──────────────────────┬──────────────────────────────────────┘
                       │
                       ▼
┌─────────────────────────────────────────────────────────────┐
│  Compliance Validation (SOC2/GDPR checks)                   │
└──────────────────────┬──────────────────────────────────────┘
                       │
                       ▼
┌─────────────────────────────────────────────────────────────┐
│                    Response                                 │
└─────────────────────────────────────────────────────────────┘

Security Configuration

Security settings are managed through hierarchical configuration:

# Security defaults in config/security.toml
[security]
auth_enabled = true
mfa_required = true
audit_enabled = true
encryption_at_rest = true
tls_min_version = "1.3"

[security.jwt]
algorithm = "RS256"
access_token_ttl = 900        # 15 minutes
refresh_token_ttl = 604800    # 7 days

[security.mfa]
totp_enabled = true
webauthn_enabled = true
backup_codes_count = 10

[security.kms]
backend = "secretumvault"
envelope_encryption = true
key_rotation_days = 90

[security.audit]
retention_days = 2555         # 7 years
export_formats = ["json", "csv", "parquet", "sqlite", "syslog"]

[security.compliance]
frameworks = ["soc2", "gdpr", "hipaa"]
policy_enforcement = "strict"

Quick Start

Enable security system for your deployment:

# Enable all security features
provisioning config set security.enabled true

# Configure authentication
provisioning config set security.auth.jwt_algorithm RS256
provisioning config set security.auth.mfa_required true

# Set up SecretumVault integration
provisioning config set security.secrets.backend secretumvault
provisioning config set security.secrets.url  [http://localhost:8200](http://localhost:8200)

# Enable audit logging
provisioning config set security.audit.enabled true
provisioning config set security.audit.retention_days 2555

# Configure compliance framework
provisioning config set security.compliance.frameworks soc2,gdpr

# Verify security configuration
provisioning security validate

Documentation Structure

This security documentation is organized into 12 detailed guides:

  1. Authentication - JWT token-based authentication and session management
  2. Authorization - Cedar policy engine and RBAC access control
  3. Multi-Factor Authentication - TOTP and WebAuthn/FIDO2 implementation
  4. Audit Logging - Comprehensive audit trails and compliance reporting
  5. Key Management Service - Encryption key management and rotation
  6. Secrets Management - SecretumVault and SOPS/Age integration
  7. Encryption - At-rest and in-transit data protection
  8. Secure Communication - TLS/mTLS and network security
  9. Certificate Management - PKI and certificate lifecycle
  10. Compliance - SOC2, GDPR, HIPAA frameworks
  11. Security Testing - Test suite and vulnerability scanning
  12. Break-Glass Procedures - Emergency access and recovery

Security Metrics

The security system tracks key metrics for monitoring and reporting:

  • Authentication Success Rate: Percentage of successful login attempts
  • MFA Adoption Rate: Percentage of users with MFA enabled
  • Policy Violations: Count of authorization denials
  • Audit Event Rate: Events logged per second
  • Secret Rotation Compliance: Percentage of secrets rotated within policy
  • Certificate Expiration: Days until certificate expiration
  • Compliance Score: Overall compliance posture percentage
  • Security Test Pass Rate: Percentage of security tests passing

Best Practices

Follow these security best practices:

  1. Enable MFA for all users: Require second factor for all accounts
  2. Rotate secrets regularly: Automate secret rotation every 90 days
  3. Monitor audit logs: Review security events daily
  4. Test security controls: Run security test suite before deployments
  5. Keep certificates current: Automate certificate renewal 30 days before expiration
  6. Review policies regularly: Audit Cedar policies quarterly
  7. Limit break-glass access: Require multi-party approval for emergency access
  8. Encrypt all data: Enable encryption at rest and in transit
  9. Follow least privilege: Grant minimal required permissions
  10. Validate compliance: Run compliance checks before production deployments

Getting Help

For security issues and questions:

  • Security Documentation: Complete guides in this security section
  • CLI Help: provisioning security help
  • Security Validation: provisioning security validate
  • Audit Query: provisioning security audit query
  • Compliance Check: provisioning security compliance check

Security Updates

The security system is continuously updated to address emerging threats and vulnerabilities. Subscribe to security advisories and apply updates promptly.


Next Steps:

Authentication

JWT token-based authentication with session management, login flows, and multi-provider support.

Overview

The authentication system verifies user identity through JWT (JSON Web Token) tokens with RS256 signatures and Argon2id password hashing. It provides secure session management, token refresh capabilities, and support for multiple authentication providers.

Architecture

Authentication Flow

┌──────────┐                ┌──────────────┐                ┌────────────┐
│  Client  │                │  Auth Service│                │  Database  │
└────┬─────┘                └──────┬───────┘                └─────┬──────┘
     │                             │                              │
     │  POST /auth/login           │                              │
     │  {username, password}       │                              │
     │────────────────────────────>│                              │
     │                             │                              │
     │                             │  Find user by username       │
     │                             │─────────────────────────────>│
     │                             │<─────────────────────────────│
     │                             │  User record                 │
     │                             │                              │
     │                             │  Verify password (Argon2id)  │
     │                             │                              │
     │                             │  Create session              │
     │                             │─────────────────────────────>│
     │                             │<─────────────────────────────│
     │                             │                              │
     │                             │  Generate JWT token pair     │
     │                             │                              │
     │  {access_token, refresh}    │                              │
     │<────────────────────────────│                              │
     │                             │                              │

Components

ComponentPurposeTechnology
AuthServiceCore authentication logicRust service in control-center
JwtServiceToken generation and verificationRS256 algorithm with jsonwebtoken crate
SessionManagerSession lifecycle managementDatabase-backed session storage
PasswordHasherPassword hashing and verificationArgon2id with configurable parameters
UserServiceUser account managementCRUD operations with role assignment

JWT Token Structure

Access Token

Short-lived token for API authentication (default: 15 minutes).

{
  "header": {
    "alg": "RS256",
    "typ": "JWT"
  },
  "payload": {
    "sub": "550e8400-e29b-41d4-a716-446655440000",
    "email": "[user@example.com](mailto:user@example.com)",
    "username": "alice",
    "roles": ["user", "developer"],
    "session_id": "sess_abc123",
    "mfa_verified": true,
    "permissions_hash": "sha256:abc123...",
    "iat": 1704067200,
    "exp": 1704068100,
    "iss": "provisioning-platform",
    "aud": "api.provisioning.example.com"
  }
}

Refresh Token

Long-lived token for obtaining new access tokens (default: 7 days).

{
  "header": {
    "alg": "RS256",
    "typ": "JWT"
  },
  "payload": {
    "sub": "550e8400-e29b-41d4-a716-446655440000",
    "session_id": "sess_abc123",
    "token_type": "refresh",
    "iat": 1704067200,
    "exp": 1704672000,
    "iss": "provisioning-platform"
  }
}

Password Security

Argon2id Configuration

Password hashing uses Argon2id with security-hardened parameters:

// Default Argon2id parameters
argon2::Params {
    m_cost: 65536,      // 64 MB memory
    t_cost: 3,          // 3 iterations
    p_cost: 4,          // 4 parallelism
    output_len: 32      // 32 byte hash
}

Password Requirements

Default password policy enforces:

  • Minimum 12 characters
  • At least one uppercase letter
  • At least one lowercase letter
  • At least one digit
  • At least one special character
  • Not in common password list
  • Not similar to username or email

Session Management

Session Lifecycle

  1. Creation: New session created on successful login
  2. Active: Session tracked with last activity timestamp
  3. Refresh: Session extended on token refresh
  4. Expiration: Session expires after inactivity timeout
  5. Revocation: Manual logout or security event terminates session

Session Storage

Sessions stored in database with:

pub struct Session {
    pub session_id: Uuid,
    pub user_id: Uuid,
    pub created_at: DateTime<Utc>,
    pub expires_at: DateTime<Utc>,
    pub last_activity: DateTime<Utc>,
    pub ip_address: Option<String>,
    pub user_agent: Option<String>,
    pub is_active: bool,
}

Session Tracking

Track multiple concurrent sessions per user:

# List active sessions for user
provisioning security sessions list --user alice

# Revoke specific session
provisioning security sessions revoke --session-id sess_abc123

# Revoke all sessions except current
provisioning security sessions revoke-all --except-current

Login Flows

Standard Login

Basic username/password authentication:

# CLI login
provisioning auth login --username alice --password <password>

# API login
curl -X POST  [https://api.provisioning.example.com/auth/login](https://api.provisioning.example.com/auth/login) \
  -H "Content-Type: application/json" \
  -d '{
    "username_or_email": "alice",
    "password": "SecurePassword123!",
    "client_info": {
      "ip_address": "192.168.1.100",
      "user_agent": "provisioning-cli/1.0"
    }
  }'

Response:

{
  "access_token": "eyJhbGciOiJSUzI1NiIsInR5cCI6IkpXVCJ9...",
  "refresh_token": "eyJhbGciOiJSUzI1NiIsInR5cCI6IkpXVCJ9...",
  "token_type": "Bearer",
  "expires_in": 900,
  "user": {
    "user_id": "550e8400-e29b-41d4-a716-446655440000",
    "username": "alice",
    "email": "[alice@example.com](mailto:alice@example.com)",
    "roles": ["user", "developer"]
  }
}

MFA Login

Two-phase authentication with MFA:

# Phase 1: Initial authentication
provisioning auth login --username alice --password <password>

# Response indicates MFA required
# {
#   "mfa_required": true,
#   "mfa_token": "temp_token_abc123",
#   "available_methods": ["totp", "webauthn"]
# }

# Phase 2: MFA verification
provisioning auth mfa-verify --mfa-token temp_token_abc123 --code 123456

SSO Login

Single Sign-On with external providers:

# Initiate SSO flow
provisioning auth sso --provider okta

# Or with SAML
provisioning auth sso --provider azure-ad --protocol saml

Token Refresh

Automatic Refresh

Client libraries automatically refresh tokens before expiration:

// Automatic token refresh in Rust client
let client = ProvisioningClient::new()
    .with_auto_refresh(true)
    .build()?;

// Tokens refreshed transparently
client.server().list().await?;

Manual Refresh

Explicit token refresh when needed:

# CLI token refresh
provisioning auth refresh

# API token refresh
curl -X POST  [https://api.provisioning.example.com/auth/refresh](https://api.provisioning.example.com/auth/refresh) \
  -H "Content-Type: application/json" \
  -d '{
    "refresh_token": "eyJhbGciOiJSUzI1NiIsInR5cCI6IkpXVCJ9..."
  }'

Response:

{
  "access_token": "eyJhbGciOiJSUzI1NiIsInR5cCI6IkpXVCJ9...",
  "token_type": "Bearer",
  "expires_in": 900
}

Multi-Provider Authentication

Supported Providers

ProviderTypeConfiguration
LocalUsername/passwordBuilt-in user database
LDAPDirectory serviceActive Directory, OpenLDAP
SAMLSSOOkta, Azure AD, OneLogin
OIDCOAuth2/OpenIDGoogle, GitHub, Auth0
mTLSCertificateClient certificate authentication

Provider Configuration

[auth.providers.ldap]
enabled = true
server = "ldap://ldap.example.com"
base_dn = "dc=example,dc=com"
bind_dn = "cn=admin,dc=example,dc=com"
user_filter = "(uid={username})"

[auth.providers.saml]
enabled = true
entity_id = " [https://provisioning.example.com"](https://provisioning.example.com")
sso_url = " [https://okta.example.com/sso/saml"](https://okta.example.com/sso/saml")
certificate_path = "/etc/provisioning/saml-cert.pem"

[auth.providers.oidc]
enabled = true
issuer = " [https://accounts.google.com"](https://accounts.google.com")
client_id = "client_id_here"
client_secret = "client_secret_here"
redirect_uri = " [https://provisioning.example.com/auth/callback"](https://provisioning.example.com/auth/callback")

Token Validation

JWT Verification

All API requests validate JWT tokens:

// Middleware validates JWT on every request
pub async fn jwt_auth_middleware(
    headers: HeaderMap,
    State(jwt_service): State<Arc<JwtService>>,
    mut request: Request,
    next: Next,
) -> Result<Response, AuthError> {
    // Extract token from Authorization header
    let token = extract_bearer_token(&headers)?;

    // Verify signature and claims
    let claims = jwt_service.verify_access_token(&token)?;

    // Check expiration
    if claims.exp < Utc::now().timestamp() {
        return Err(AuthError::TokenExpired);
    }

    // Inject user context into request
    request.extensions_mut().insert(claims);

    Ok(next.run(request).await)
}

Token Revocation

Revoke tokens on security events:

# Revoke all tokens for user
provisioning security tokens revoke-user --user alice

# Revoke specific token
provisioning security tokens revoke --token-id token_abc123

# Check token status
provisioning security tokens status --token eyJhbGci...

Security Hardening

Configuration

Secure authentication settings:

[security.auth]
# JWT settings
jwt_algorithm = "RS256"
jwt_issuer = "provisioning-platform"
access_token_ttl = 900           # 15 minutes
refresh_token_ttl = 604800       # 7 days
token_leeway = 30                # 30 seconds clock skew

# Password policy
password_min_length = 12
password_require_uppercase = true
password_require_lowercase = true
password_require_digit = true
password_require_special = true
password_check_common = true

# Session settings
session_timeout = 1800           # 30 minutes inactivity
max_sessions_per_user = 5
remember_me_duration = 2592000   # 30 days

# Security controls
enforce_mfa = true
allow_password_reset = true
lockout_after_attempts = 5
lockout_duration = 900           # 15 minutes

Best Practices

  1. Use strong passwords: Enforce password policy with minimum 12 characters
  2. Enable MFA: Require second factor for all users
  3. Rotate keys regularly: Update JWT signing keys every 90 days
  4. Monitor failed attempts: Alert on suspicious login patterns
  5. Limit session duration: Use short access token TTL with refresh tokens
  6. Secure token storage: Store tokens securely, never in local storage
  7. Validate on every request: Always verify JWT signature and expiration
  8. Use HTTPS only: Never transmit tokens over unencrypted connections

CLI Integration

Login and Session Management

# Login with credentials
provisioning auth login --username alice

# Login with MFA
provisioning auth login --username alice --mfa

# Check authentication status
provisioning auth status

# Logout (revoke session)
provisioning auth logout

# List active sessions
provisioning security sessions list

# Refresh token
provisioning auth refresh

Token Management

# Show current token
provisioning auth token show

# Validate token
provisioning auth token validate

# Decode token (without verification)
provisioning auth token decode

# Revoke token
provisioning auth token revoke

API Reference

Endpoints

EndpointMethodPurpose
/auth/loginPOSTAuthenticate with credentials
/auth/refreshPOSTRefresh access token
/auth/logoutPOSTRevoke session and tokens
/auth/verifyPOSTVerify MFA code
/auth/sessionsGETList active sessions
/auth/sessions/:idDELETERevoke specific session
/auth/password-resetPOSTInitiate password reset
/auth/password-changePOSTChange password

Troubleshooting

Common Issues

Token expired errors:

# Refresh token
provisioning auth refresh

# Or re-login
provisioning auth login

Invalid signature:

# Check JWT configuration
provisioning config get security.auth.jwt_algorithm

# Verify public key is correct
provisioning security keys verify

MFA verification failures:

# Check time sync (TOTP requires accurate time)
ntpdate -q pool.ntp.org

# Re-sync MFA device
provisioning auth mfa-setup --resync

Session not found:

# Clear local session and re-login
provisioning auth logout
provisioning auth login

Monitoring

Metrics

Track authentication metrics:

  • Login success rate
  • Failed login attempts per user
  • Average session duration
  • Token refresh rate
  • MFA verification success rate
  • Active sessions count

Alerts

Configure alerts for security events:

  • Multiple failed login attempts
  • Login from new location
  • Unusual authentication patterns
  • Session hijacking attempts
  • Token tampering detected

Next Steps:

Authorization

Multi-Factor Authentication

Audit Logging

KMS Guide

Secrets Management

SecretumVault Integration Guide

SecretumVault is a post-quantum cryptography (PQC) secure vault system integrated with Provisioning’s vault-service. It provides quantum-resistant encryption for sensitive credentials and infrastructure secrets.

Overview

SecretumVault combines:

  • Post-Quantum Cryptography: Algorithms resistant to quantum computer attacks
  • Hardware Acceleration: Optional FPGA acceleration for performance
  • Distributed Architecture: Multi-node secure storage
  • Compliance: FIPS 140-3 ready, NIST standards

Architecture

Integration Points

Provisioning
    ├─ CLI (Nushell)
    │   └─ nu_plugin_secretumvault
    │
    ├─ vault-service (Rust)
    │   ├─ secretumvault backend
    │   ├─ rustyvault compatibility
    │   └─ SOPS + Age integration
    │
    └─ Control Center
        └─ Secret management UI

Cryptographic Stack

User Secret
    ↓
KDF (Key Derivation Function)
    ├─ Argon2id (password-based)
    └─ HKDF (key-based)
    ↓
PQC Encryption Layer
    ├─ CRYSTALS-Kyber (key encapsulation)
    ├─ Falcon (signature)
    ├─ SPHINCS+ (backup signature)
    └─ Hybrid: PQC + Classical (AES-256)
    ↓
Authenticated Encryption
    ├─ ChaCha20-Poly1305
    └─ AES-256-GCM
    ↓
Secure Storage
    ├─ Local vault
    ├─ SurrealDB
    └─ Hardware module (optional)

Installation

Install SecretumVault

# Install via provisioning
provisioning install secretumvault

# Or manual installation
cd /Users/Akasha/Development/secretumvault
cargo install --path .

# Verify installation
secretumvault --version

Install Nushell Plugin

# Install plugin
provisioning install nu-plugin-secretumvault

# Reload Nushell
nu -c "plugin add nu_plugin_secretumvault"

# Verify
nu -c "secretumvault-plugin version"

Configuration

Environment Setup

# Set vault location
export SECRETUMVAULT_HOME=~/.secretumvault

# Set encryption algorithm
export SECRETUMVAULT_CIPHER=kyber-aes  # kyber-aes, falcon-aes, hybrid

# Set key derivation
export SECRETUMVAULT_KDF=argon2id      # argon2id, pbkdf2

# Enable hardware acceleration (optional)
export SECRETUMVAULT_HW_ACCEL=enabled

Configuration File

# ~/.secretumvault/config.yaml
vault:
  storage_backend: surrealdb          # local, surrealdb, redis
  encryption_cipher: kyber-aes        # kyber-aes, falcon-aes, hybrid
  key_derivation: argon2id            # argon2id, pbkdf2

  # Argon2id parameters (password strength)
  kdf:
    memory: 65536                     # KB
    iterations: 3
    parallelism: 4

  # Encryption parameters
  encryption:
    key_length: 256                   # bits
    nonce_length: 12                  # bytes
    auth_tag_length: 16               # bytes

# Database backend (if using SurrealDB)
database:
  url: "surrealdb://localhost:8000"
  namespace: "provisioning"
  database: "secrets"

# Hardware acceleration (optional)
hardware:
  use_fpga: false
  fpga_device: "/dev/fpga0"

# Backup configuration
backup:
  enabled: true
  interval: 24                        # hours
  retention: 30                       # days
  encrypt_backup: true
  backup_path: ~/.secretumvault/backups

# Access logging
audit:
  enabled: true
  log_file: ~/.secretumvault/audit.log
  log_level: info
  rotate_logs: true
  retention_days: 365

# Master key management
master_key:
  protection: none                    # none, tpm, hsm, hardware-module
  rotation_enabled: true
  rotation_interval: 90               # days

Usage

Command Line Interface

# Create master key
secretumvault init

# Add secret
secretumvault secret add \
  --name database-password \
  --value "supersecret" \
  --metadata "type=database,app=api"

# Retrieve secret
secretumvault secret get database-password

# List secrets
secretumvault secret list

# Delete secret
secretumvault secret delete database-password

# Rotate key
secretumvault key rotate

# Backup vault
secretumvault backup create --output vault-backup.enc

# Restore vault
secretumvault backup restore vault-backup.enc

Nushell Integration

# Load SecretumVault plugin
plugin add nu_plugin_secretumvault

# Add secret from Nushell
let password = "mypassword"
secretumvault-plugin store "app-secret" $password

# Retrieve secret
let db_pass = (secretumvault-plugin retrieve "database-password")

# List all secrets
secretumvault-plugin list

# Delete secret
secretumvault-plugin delete "old-secret"

# Rotate key
secretumvault-plugin rotate-key

Provisioning Integration

# Configure vault-service to use SecretumVault
provisioning config set security.vault.backend secretumvault

# Enable in form prefill
provisioning setup profile --use-secretumvault

# Manage secrets via CLI
provisioning vault add \
  --name aws-access-key \
  --value "AKIAIOSFODNN7EXAMPLE" \
  --metadata "provider=aws,env=production"

# Use secret in infrastructure
provisioning ai "Create AWS resources using secret aws-access-key"

Post-Quantum Cryptography

Algorithms Supported

AlgorithmTypeNIST StatusPerformance
CRYSTALS-KyberKEMFinalistFast
FalconSignatureFinalistMedium
SPHINCS+Hash-based SignatureFinalistSlower
AES-256Hybrid (Classical)StandardVery fast
ChaCha20Stream CipherAlternativeFast

SecretumVault uses hybrid encryption by default:

Secret Input
    ↓
Key Material: Classical (AES-256) + PQC (Kyber)
    ├─ Generate AES key
    ├─ Generate Kyber keypair
    └─ Encapsulate using Kyber
    ↓
Encrypt with both algorithms
    ├─ AES-256-GCM encryption
    └─ Kyber encapsulation (public key cryptography)
    ↓
Both keys required to decrypt
    ├─ If quantum computer breaks Kyber → AES still secure
    └─ If breakthrough in AES → Kyber still secure
    ↓
Encrypted Secret Stored

Advantages:

  • Protection against quantum computers (PQC)
  • Protection against classical attacks (AES-256)
  • Compatible with both current and future threats
  • No single point of failure

Key Rotation Strategy

# Manual key rotation
secretumvault key rotate --algorithm kyber-aes

# Scheduled rotation (every 90 days)
secretumvault key rotate --schedule 90d

# Emergency rotation
secretumvault key rotate --emergency --force

Security Features

Authentication

# Master key authentication
secretumvault auth login

# MFA for sensitive operations
secretumvault auth mfa enable --method totp

# Biometric unlock (supported platforms)
secretumvault auth enable-biometric

Access Control

# Set vault permissions
secretumvault acl set database-password \
  --read "api-service,backup-service" \
  --write "admin" \
  --delete "admin"

# View access logs
secretumvault audit log --secret database-password

Audit Logging

Every operation is logged:

# View audit log
secretumvault audit log --since 24h

# Export audit log
secretumvault audit export --format json > audit.json

# Monitor real-time
secretumvault audit monitor

Sample Log Entry:

{
  "timestamp": "2026-01-16T01:47:00Z",
  "operation": "secret_retrieve",
  "secret": "database-password",
  "user": "api-service",
  "status": "success",
  "ip_address": "127.0.0.1",
  "device_id": "device-123"
}

Disaster Recovery

Backup Procedures

# Create encrypted backup
secretumvault backup create \
  --output /secure/vault-backup.enc \
  --compression gzip

# Verify backup integrity
secretumvault backup verify /secure/vault-backup.enc

# Restore from backup
secretumvault backup restore \
  --input /secure/vault-backup.enc \
  --verify-checksum

Recovery Key

# Generate recovery key (for emergencies)
secretumvault recovery-key generate \
  --threshold 3 \
  --shares 5

# Share recovery shards
# Share with 5 trusted people, need 3 to recover

# Recover using shards
secretumvault recovery-key restore \
  --shard1 /secure/shard1.key \
  --shard2 /secure/shard2.key \
  --shard3 /secure/shard3.key

Performance

Benchmark Results

OperationTimeAlgorithm
Store secret50-100msKyber-AES
Retrieve secret30-50msKyber-AES
Key rotation200-500msKyber-AES
Backup 1000 secrets2-3 secondsKyber-AES
Restore from backup3-5 secondsKyber-AES

Hardware Acceleration

With FPGA acceleration:

OperationNativeFPGASpeedup
Store secret75ms15ms5x
Key rotation350ms50ms7x
Backup 10002.5s0.4s6x

Troubleshooting

Cannot Initialize Vault

# Check permissions
ls -la ~/.secretumvault

# Clear corrupted state
rm ~/.secretumvault/state.lock

# Reinitialize
secretumvault init --force

Slow Performance

# Check algorithm
secretumvault config get encryption.cipher

# Switch to faster algorithm
export SECRETUMVAULT_CIPHER=kyber-aes

# Enable hardware acceleration
export SECRETUMVAULT_HW_ACCEL=enabled

Master Key Lost

# Use recovery key (if available)
secretumvault recovery-key restore \
  --shard1 ... --shard2 ... --shard3 ...

# If no recovery key exists, vault is unrecoverable
# Use recent backup instead
secretumvault backup restore vault-backup.enc

Compliance & Standards

Certifications

  • NIST PQC Standards: CRYSTALS-Kyber, Falcon, SPHINCS+
  • FIPS 140-3 Ready: Cryptographic module certification path
  • NIST SP 800-175B: Post-quantum cryptography guidance
  • EU Cyber Resilience Act: PQC readiness

Export Controls

SecretumVault is subject to cryptography export controls in some jurisdictions. Ensure compliance with local regulations.

Encryption

Secure Communication

Certificate Management

Compliance

Security Testing

Provisioning Logo

Provisioning

Development

Comprehensive guides for developers building extensions, custom providers, plugins, and integrations on the Provisioning platform.

Overview

Provisioning is designed to be extended and customized for specific infrastructure needs. This section provides everything needed to:

  • Build custom cloud providers interfacing with any infrastructure platform via the Provider SDK
  • Create custom detectors for domain-specific infrastructure analysis and anomaly detection
  • Develop task services for specialized infrastructure operations beyond built-in services
  • Write Nushell plugins for high-performance scripting extensions
  • Integrate external systems via REST APIs and the MCP (Model Context Protocol)
  • Understand platform internals for daemon architecture, caching, and performance optimization

The platform uses modern Rust with async/await, Nushell for scripting, and Nickel for configuration - all with production-ready code examples.

Development Guides

Extension Development

Platform Internals

Integration and APIs

  • API Guide - REST API integration with authentication, pagination, error handling, rate limiting

  • Build System - Cargo configuration, feature flags, dependencies, cross-platform compilation

  • Testing - Unit, integration, property-based testing, benchmarking, CI/CD patterns

Community

  • Contributing - Guidelines, standards, review process, licensing

Quick Start Paths

I want to build a custom provider

Start with Custom Provider Development - includes template, credential patterns, error handling, tests, and publishing workflow.

I want to create custom detectors

See Custom Detector Development - covers analysis frameworks, state tracking, testing, and marketplace distribution.

I want to extend with Nushell

Read Plugin Development - FFI bindings, type safety, performance optimization, and integration patterns.

I want to understand system performance

Study Provisioning Daemon Internals - architecture, caching strategy, connection pooling, metrics collection.

I want to integrate external systems

Check API Guide - REST endpoints, authentication, webhooks, and integration patterns.

Technology Stack

  • Language: Rust (async/await with Tokio), Nushell (scripting)
  • Configuration: Nickel (type-safe) + TOML (generated)
  • Testing: Unit tests, integration tests, property-based tests
  • Performance: Prometheus metrics, connection pooling, LRU caching
  • Security: Post-quantum cryptography, type-safety, secure defaults

Development Environment

All development builds with:

cargo build --release
cargo test --all
cargo clippy -- -D warnings
  • For architecture insights → See provisioning/docs/src/architecture/
  • For API details → See provisioning/docs/src/api-reference/
  • For examples → See provisioning/docs/src/examples/
  • For deployment → See provisioning/docs/src/operations/

Extension Development

Creating custom extensions to add providers, task services, and clusters to the Provisioning platform.

Extension Overview

Extensions are modular components that extend platform capabilities:

Extension TypePurposeImplementationComplexity
ProvidersCloud infrastructure backendsNushell scripts + Nickel schemasModerate
Task ServicesInfrastructure componentsNushell installation scriptsSimple
ClustersComplete deploymentsNickel schemas + orchestrationModerate
WorkflowsAutomation templatesNickel workflow definitionsSimple

Extension Structure

Standard extension directory layout:

provisioning/extensions/<type>/<name>/
├── nickel/
│   ├── schema.ncl      # Nickel type definitions
│   ├── defaults.ncl    # Default configuration
│   └── validation.ncl  # Validation rules
├── scripts/
│   ├── install.nu      # Installation script
│   ├── uninstall.nu    # Removal script
│   └── validate.nu     # Validation script
├── templates/
│   └── config.template # Configuration templates
├── tests/
│   └── test_*.nu       # Test scripts
├── docs/
│   └── README.md       # Documentation
└── metadata.toml       # Extension metadata

Extension Metadata

Every extension requires metadata.toml:

# metadata.toml
[extension]
name = "my-provider"
type = "provider"
version = "1.0.0"
description = "Custom cloud provider"
author = "Your Name <[email@example.com](mailto:email@example.com)>"
license = "MIT"

[dependencies]
nushell = ">=0.109.0"
nickel = ">=1.15.1"

[dependencies.extensions]
# Other extensions this depends on
base-provider = "1.0.0"

[capabilities]
create_server = true
delete_server = true
create_network = true

[configuration]
required_fields = ["api_key", "region"]
optional_fields = ["timeout", "retry_attempts"]

Creating a Provider Extension

Providers implement cloud infrastructure backends.

Provider Structure

provisioning/extensions/providers/my-provider/
├── nickel/
│   ├── schema.ncl
│   ├── server.ncl
│   └── network.ncl
├── scripts/
│   ├── create_server.nu
│   ├── delete_server.nu
│   ├── list_servers.nu
│   └── validate.nu
├── templates/
│   └── server.template
├── tests/
│   └── test_provider.nu
└── metadata.toml

Provider Schema (Nickel)

# nickel/schema.ncl
{
  Provider = {
    name | String,
    api_key | String,
    region | String,
    timeout | default = 30 | Number,

    server_config = {
      default_plan | default = "medium" | String,
      allowed_plans | Array String,
    },
  },

  Server = {
    name | String,
    plan | String,
    zone | String,
    hostname | String,
    tags | default = [] | Array String,
  },
}

Provider Implementation (Nushell)

# scripts/create_server.nu
#!/usr/bin/env nu

# Create server using provider API
export def main [
    config: record  # Provider configuration
    server: record  # Server specification
] {
    # Validate configuration
    validate-config $config

    # Construct API request
    let request = {
        name: $server.name
        plan: $server.plan
        zone: $server.zone
    }

    # Call provider API
    let response = http post $"($config.api_endpoint)/servers" {
        headers: {
            Authorization: $"Bearer ($config.api_key)"
        }
        body: ($request | to json)
    }

    # Return server details
    $response | from json
}

# Validate provider configuration
def validate-config [config: record] {
    if ($config.api_key | is-empty) {
        error make {msg: "api_key is required"}
    }

    if ($config.region | is-empty) {
        error make {msg: "region is required"}
    }
}

Provider Interface Contract

All providers must implement:

# Required operations
create_server    # Create new server
delete_server    # Delete existing server
get_server       # Get server details
list_servers     # List all servers
server_status    # Check server status

# Optional operations
create_network   # Create network
delete_network   # Delete network
attach_storage   # Attach storage volume
create_snapshot  # Create server snapshot

Creating a Task Service Extension

Task services are installable infrastructure components.

Task Service Structure

provisioning/extensions/taskservs/my-service/
├── nickel/
│   ├── schema.ncl
│   └── defaults.ncl
├── scripts/
│   ├── install.nu
│   ├── uninstall.nu
│   ├── health.nu
│   └── validate.nu
├── templates/
│   ├── config.yaml.template
│   └── systemd.service.template
├── tests/
│   └── test_service.nu
├── docs/
│   └── README.md
└── metadata.toml

Task Service Metadata

# metadata.toml
[extension]
name = "my-service"
type = "taskserv"
version = "2.1.0"
description = "Custom infrastructure service"

[dependencies.taskservs]
# Task services this depends on
containerd = ">=1.7.0"
kubernetes = ">=1.28.0"

[installation]
requires_root = true
platforms = ["linux"]
architectures = ["x86_64", "aarch64"]

[health_check]
enabled = true
endpoint = " [http://localhost:8000/health"](http://localhost:8000/health")
interval = 30
timeout = 5

Task Service Installation Script

# scripts/install.nu
#!/usr/bin/env nu

export def main [
    config: record  # Service configuration
    server: record  # Target server details
] {
    print "Installing my-service..."

    # Download binaries
    let version = $config.version? | default "latest"
    download-binary $version

    # Install systemd service
    install-systemd-service $config

    # Configure service
    generate-config $config

    # Start service
    start-service

    # Verify installation
    verify-installation

    print "Installation complete"
}

def download-binary [version: string] {
    let url = $" [https://github.com/org/my-service/releases/download/($versio](https://github.com/org/my-service/releases/download/($versio)n)/my-service"
    http get $url | save /usr/local/bin/my-service
    chmod +x /usr/local/bin/my-service
}

def install-systemd-service [config: record] {
    let template = open ../templates/systemd.service.template
    let rendered = $template | str replace --all "{{VERSION}}" $config.version
    $rendered | save /etc/systemd/system/my-service.service
    systemctl daemon-reload
}

def start-service [] {
    systemctl enable my-service
    systemctl start my-service
}

def verify-installation [] {
    let status = systemctl is-active my-service
    if $status != "active" {
        error make {msg: "Service failed to start"}
    }

    # Health check
    sleep 5sec
    let health = http get  [http://localhost:8000/health](http://localhost:8000/health)
    if $health.status != "healthy" {
        error make {msg: "Health check failed"}
    }
}

Creating a Cluster Extension

Clusters combine servers and task services into complete deployments.

Cluster Schema

# nickel/schema.ncl
{
  Cluster = {
    metadata = {
      name | String,
      provider | String,
      environment | default = "production" | String,
    },

    infrastructure = {
      servers | Array {
        name | String,
        role | | [ "control", "worker", "storage" | ],
        plan | String,
      },
    },

    services = {
      taskservs | Array String,
      order | default = [] | Array String,
    },

    networking = {
      private_network | default = true | Bool,
      cidr | default = "10.0.0.0/16" | String,
    },
  },
}

Cluster Definition Example

# clusters/kubernetes-ha.ncl
{
  metadata.name = "k8s-ha-cluster",
  metadata.provider = "upcloud",

  infrastructure.servers = [
    {name = "control-01", role = "control", plan = "large"},
    {name = "control-02", role = "control", plan = "large"},
    {name = "control-03", role = "control", plan = "large"},
    {name = "worker-01", role = "worker", plan = "xlarge"},
    {name = "worker-02", role = "worker", plan = "xlarge"},
  ],

  services.taskservs = ["containerd", "etcd", "kubernetes", "cilium"],
  services.order = ["containerd", "etcd", "kubernetes", "cilium"],

  networking.private_network = true,
  networking.cidr = "10.100.0.0/16",
}

Extension Testing

Test Structure

# tests/test_provider.nu
use std assert

# Test provider configuration validation
export def test_validate_config [] {
    let valid_config = {
        api_key: "test-key"
        region: "us-east-1"
    }

    let result = validate-config $valid_config
    assert equal $result.valid true
}

# Test server creation
export def test_create_server [] {
    let config = load-test-config
    let server_spec = {
        name: "test-server"
        plan: "medium"
        zone: "us-east-1a"
    }

    let result = create-server $config $server_spec
    assert equal $result.status "created"
}

# Run all tests
export def main [] {
    test_validate_config
    test_create_server
    print "All tests passed"
}

Run tests:

# Test extension
provisioning extension test my-provider

# Test specific component
nu tests/test_provider.nu

Extension Packaging

OCI Registry Publishing

Package and publish extension:

# Build extension package
provisioning extension build my-provider

# Validate package
provisioning extension validate my-provider-1.0.0.tar.gz

# Publish to registry
provisioning extension publish my-provider-1.0.0.tar.gz \
  --registry registry.example.com

Package structure:

my-provider-1.0.0.tar.gz
├── metadata.toml
├── nickel/
├── scripts/
├── templates/
├── tests/
├── docs/
└── manifest.json

Extension Installation

Install extension from registry:

# Install from OCI registry
provisioning extension install my-provider --version 1.0.0

# Install from local file
provisioning extension install ./my-provider-1.0.0.tar.gz

# List installed extensions
provisioning extension list

# Update extension
provisioning extension update my-provider --version 1.1.0

# Uninstall extension
provisioning extension uninstall my-provider

Best Practices

  • Follow naming conventions: lowercase with hyphens
  • Version extensions semantically (semver)
  • Document all configuration options
  • Provide comprehensive tests
  • Include usage examples in docs
  • Validate input parameters
  • Handle errors gracefully
  • Log important operations
  • Support idempotent operations
  • Keep dependencies minimal

Provider Development

Implementing custom cloud provider integrations for the Provisioning platform.

Provider Architecture

Providers abstract cloud infrastructure APIs through a unified interface, allowing infrastructure definitions to be portable across clouds.

Provider Interface

All providers must implement these core operations:

# Server lifecycle
create_server     # Provision new server
delete_server     # Remove server
get_server        # Fetch server details
list_servers      # List all servers
update_server     # Modify server configuration
server_status     # Get current state

# Network operations (optional)
create_network    # Create private network
delete_network    # Remove network
attach_network    # Attach server to network

# Storage operations (optional)
attach_volume     # Attach storage volume
detach_volume     # Detach storage volume
create_snapshot   # Snapshot server disk

Provider Template

Use the official provider template:

# Generate provider scaffolding
provisioning generate provider --name my-cloud --template standard

# Creates:
# extensions/providers/my-cloud/
# ├── nickel/
# │   ├── schema.ncl
# │   ├── server.ncl
# │   └── network.ncl
# ├── scripts/
# │   ├── create_server.nu
# │   ├── delete_server.nu
# │   └── list_servers.nu
# └── metadata.toml

Provider Schema (Nickel)

Define provider configuration schema:

# nickel/schema.ncl
{
  ProviderConfig = {
    name | String,
    api_endpoint | String,
    api_key | String,
    region | String,
    timeout | default = 30 | Number,
    retry_attempts | default = 3 | Number,

    plans = {
      small  = {cpu = 2, memory = 4096, disk = 25},
      medium = {cpu = 4, memory = 8192, disk = 50},
      large  = {cpu = 8, memory = 16384, disk = 100},
    },

    regions | Array String,
  },

  ServerSpec = {
    name | String,
    plan | String,
    zone | String,
    image | default = "ubuntu-24.04" | String,
    ssh_keys | Array String,
    user_data | default = "" | String,
  },
}

Implementing Server Creation

Create server implementation:

# scripts/create_server.nu
#!/usr/bin/env nu

export def main [
    config: record,  # Provider configuration
    spec: record     # Server specification
]: nothing -> record {
    # Validate inputs
    validate-provider-config $config
    validate-server-spec $spec

    # Map plan to provider-specific values
    let plan = get-plan-details $config $spec.plan

    # Construct API request
    let request = {
        hostname: $spec.name
        plan: $plan.name
        zone: $spec.zone
        storage_devices: [{
            action: "create"
            storage: $plan.disk
            title: "root"
        }]
        login: {
            user: "root"
            keys: $spec.ssh_keys
        }
    }

    # Call provider API with retry logic
    let server = retry-api-call | { |
        http post $"($config.api_endpoint)/server" {
            headers: {Authorization: $"Bearer ($config.api_key)"}
            body: ($request | to json)
        } | from json
    } $config.retry_attempts

    # Wait for server to be ready
    wait-for-server-ready $config $server.uuid

    # Return server details
    {
        id: $server.uuid
        name: $server.hostname
        ip_address: $server.ip_addresses.0.address
        status: "running"
        provider: $config.name
    }
}

def validate-provider-config [config: record] {
    if ($config.api_key | is-empty) {
        error make {msg: "API key required"}
    }
    if ($config.region | is-empty) {
        error make {msg: "Region required"}
    }
}

def get-plan-details [config: record, plan_name: string]: nothing -> record {
    $config.plans | get $plan_name
}

def retry-api-call [operation: closure, max_attempts: int]: nothing -> any {
    mut attempt = 1
    mut last_error = null

    while $attempt <= $max_attempts {
        try {
            return (do $operation)
        } catch | { err |
            $last_error = $err
            if $attempt < $max_attempts {
                sleep (1sec * $attempt)  # Exponential backoff
                $attempt = $attempt + 1
            }
        }
    }

    error make {msg: $"API call failed after ($max_attempts) attempts: ($last_error)"}
}

def wait-for-server-ready [config: record, server_id: string] {
    mut ready = false
    mut attempts = 0
    let max_wait = 120  # 2 minutes

    while not $ready and $attempts < $max_wait {
        let status = http get $"($config.api_endpoint)/server/($server_id)" {
            headers: {Authorization: $"Bearer ($config.api_key)"}
        } | from json

        if $status.state == "started" {
            $ready = true
        } else {
            sleep 1sec
            $attempts = $attempts + 1
        }
    }

    if not $ready {
        error make {msg: "Server failed to start within timeout"}
    }
}

Provider Testing

Comprehensive provider testing:

# tests/test_provider.nu
use std assert

export def test_create_server [] {
    # Mock provider config
    let config = {
        name: "test-cloud"
        api_endpoint: " [http://localhost:8080"](http://localhost:8080")
        api_key: "test-key"
        region: "test-region"
        plans: {
            small: {cpu: 2, memory: 4096, disk: 25}
        }
    }

    # Mock server spec
    let spec = {
        name: "test-server"
        plan: "small"
        zone: "test-zone"
        ssh_keys: ["ssh-rsa AAAA..."]
    }

    # Test server creation
    let server = create-server $config $spec

    assert ($server.id != null)
    assert ($server.name == "test-server")
    assert ($server.status == "running")
}

export def test_list_servers [] {
    let config = load-test-config
    let servers = list-servers $config

    assert ($servers | length) > 0
}

export def main [] {
    print "Running provider tests..."
    test_create_server
    test_list_servers
    print "All tests passed!"
}

Error Handling

Robust error handling for provider operations:

# Handle API errors gracefully
def handle-api-error [error: record]: nothing -> record {
    match $error.status {
        401 => {error make {msg: "Authentication failed - check API key"}}
        403 => {error make {msg: "Permission denied - insufficient privileges"}}
        404 => {error make {msg: "Resource not found"}}
        429 => {error make {msg: "Rate limit exceeded - retry later"}}
        500 => {error make {msg: "Provider API error - contact support"}}
        _   => {error make {msg: $"Unknown error: ($error.message)"}}
    }
}

Provider Best Practices

  • Implement idempotent operations where possible
  • Handle rate limiting with exponential backoff
  • Validate all inputs before API calls
  • Log all API requests and responses (without secrets)
  • Use connection pooling for better performance
  • Cache provider capabilities and quotas
  • Implement proper timeout handling
  • Return consistent error messages
  • Test against provider sandbox/staging environment
  • Version provider schemas carefully

Plugin Development

Developing Nushell plugins for performance-critical operations in the Provisioning platform.

Plugin Overview

Nushell plugins provide 10-50x performance improvement over HTTP APIs through native Rust implementations.

Available Plugins

PluginPurposePerformance GainLanguage
nu_plugin_authAuthentication and OS keyring5x fasterRust
nu_plugin_kmsKMS encryption operations10x fasterRust
nu_plugin_orchestratorOrchestrator queries30x fasterRust

Plugin Architecture

Plugins communicate with Nushell via MessagePack protocol:

Nushell ←→ MessagePack ←→ Plugin Process
  ↓                           ↓
Script                    Native Rust

Creating a Plugin

Plugin Template

Generate plugin scaffold:

# Create new plugin
cargo new --lib nu_plugin_myfeature
cd nu_plugin_myfeature

Add dependencies to Cargo.toml:

[package]
name = "nu_plugin_myfeature"
version = "0.1.0"
edition = "2021"

[dependencies]
nu-plugin = "0.109.0"
nu-protocol = "0.109.0"
serde = {version = "1.0", features = ["derive"]}

Plugin Implementation

Implement plugin interface:

// src/lib.rs
use nu_plugin::{EvaluatedCall, LabeledError, Plugin};
use nu_protocol::{Category, PluginSignature, SyntaxShape, Type, Value};

pub struct MyFeaturePlugin;

impl Plugin for MyFeaturePlugin {
    fn signature(&self) -> Vec<PluginSignature> {
        vec![
            PluginSignature::build("my-feature")
                .usage("Perform my feature operation")
                .required("input", SyntaxShape::String, "input value")
                .input_output_type(Type::String, Type::String)
                .category(Category::Custom("provisioning".into())),
        ]
    }

    fn run(
        &mut self,
        name: &str,
        call: &EvaluatedCall,
        input: &Value,
    ) -> Result<Value, LabeledError> {
        match name {
            "my-feature" => self.my_feature(call, input),
            _ => Err(LabeledError {
                label: "Unknown command".into(),
                msg: format!("Unknown command: {}", name),
                span: None,
            }),
        }
    }
}

impl MyFeaturePlugin {
    fn my_feature(&self, call: &EvaluatedCall, _input: &Value) -> Result<Value, LabeledError> {
        let input: String = call.req(0)?;

        // Perform operation
        let result = perform_operation(&input);

        Ok(Value::string(result, call.head))
    }
}

fn perform_operation(input: &str) -> String {
    // Your implementation here
    format!("Processed: {}", input)
}

// Plugin entry point
fn main() {
    nu_plugin::serve_plugin(&mut MyFeaturePlugin, nu_plugin::MsgPackSerializer {})
}

Building Plugin

# Build release version
cargo build --release

# Install plugin
nu -c 'plugin add target/release/nu_plugin_myfeature'
nu -c 'plugin use myfeature'

# Test plugin
nu -c 'my-feature "test input"'

Plugin Performance Optimization

Benchmarking

use std::time::Instant;

pub fn benchmark_operation() {
    let start = Instant::now();

    // Operation to benchmark
    perform_expensive_operation();

    let duration = start.elapsed();
    eprintln!("Operation took: {:?}", duration);
}

Caching

Implement caching for expensive operations:

use std::collections::HashMap;
use std::sync::{Arc, Mutex};

pub struct CachedPlugin {
    cache: Arc<Mutex<HashMap<String, String>>>,
}

impl CachedPlugin {
    fn get_or_compute(&self, key: &str) -> String {
        let mut cache = self.cache.lock().unwrap();

        if let Some(value) = cache.get(key) {
            return value.clone();
        }

        let value = expensive_computation(key);
        cache.insert(key.to_string(), value.clone());
        value
    }
}

Testing Plugins

Unit Tests

#[cfg(test)]
mod tests {
    use super::*;
    use nu_protocol::{Span, Value};

    #[test]
    fn test_my_feature() {
        let plugin = MyFeaturePlugin;
        let input = Value::string("test", Span::test_data());
        let result = plugin.my_feature(&mock_call(), &input).unwrap();

        assert_eq!(result.as_string().unwrap(), "Processed: test");
    }

    fn mock_call() -> EvaluatedCall {
        // Mock EvaluatedCall for testing
        todo!()
    }
}

Integration Tests

# tests/test_plugin.nu
use std assert

def test_plugin_functionality [] {
    let result = my-feature "test input"
    assert equal $result "Processed: test input"
}

def main [] {
    test_plugin_functionality
    print "Plugin tests passed"
}

Plugin Best Practices

  • Keep plugin logic focused and single-purpose
  • Minimize dependencies to reduce binary size
  • Use async operations for I/O-bound tasks
  • Implement proper error handling
  • Document all plugin commands
  • Version plugins with semantic versioning
  • Provide fallback to HTTP API if plugin unavailable
  • Cache expensive computations
  • Profile and benchmark performance improvements

API Integration Guide

Integrate third-party APIs with Provisioning infrastructure.

API Client Development

Create clients for external APIs:

// src/api_client.rs
use reqwest::Client;

pub struct ApiClient {
    endpoint: String,
    api_key: String,
    client: Client,
}

impl ApiClient {
    pub async fn call(&self, path: &str) -> Result<Response> {
        let url = format!("{}{}", self.endpoint, path);
        self.client
            .get(&url)
            .bearer_auth(&self.api_key)
            .send()
            .await
    }
}

Webhook Integration

Handle webhooks from external systems:

#[post("/webhooks/{service}")]
pub async fn handle_webhook(path: web::Path<String>, body: web::Bytes) -> impl Responder {
    let service = path.into_inner();
    match service.as_str() {
        "github" => handle_github_webhook(&body),
        "stripe" => handle_stripe_webhook(&body),
        _ => HttpResponse::NotFound().finish(),
    }
}

Error Handling

Robust error handling for API calls with retries:

pub async fn call_api_with_retry(
    client: &ApiClient,
    path: &str,
    max_retries: u32,
) -> Result<Response> {
    for attempt in 0..max_retries {
        match client.call(path).await {
            Ok(response) => return Ok(response),
            Err(e) if attempt < max_retries - 1 => {
                let delay = Duration::from_secs(2_u64.pow(attempt));
                tokio::time::sleep(delay).await;
            }
            Err(e) => return Err(e),
        }
    }
    Err(ApiError::MaxRetriesExceeded.into())
}

Build System

Building, testing, and packaging the Provisioning platform and extensions with Cargo, Just, and Nickel.

Build Tools

ToolPurposeVersion Required
CargoRust compilation and testingLatest stable
JustTask runner for common operationsLatest
NickelSchema validation and type checking1.15.1+
NushellScript execution and testing0.109.0+

Building Platform Services

Build All Services

# Build all Rust services in release mode
cd provisioning/platform
cargo build --release --workspace

# Or using just task runner
just build-platform

Binary outputs in target/release/:

  • provisioning-orchestrator
  • provisioning-control-center
  • provisioning-vault-service
  • provisioning-installer

Build Individual Service

# Orchestrator service
cd provisioning/platform/crates/orchestrator
cargo build --release

# Control Center service
cd provisioning/platform/crates/control-center
cargo build --release

# Development build (faster compilation)
cargo build

Testing

Run All Tests

# Rust unit and integration tests
cargo test --workspace

# Nushell script tests
just test-nushell

# Complete test suite
just test-all

Test Specific Component

# Test orchestrator crate
cargo test -p provisioning-orchestrator

# Test with output visible
cargo test -p provisioning-orchestrator -- --nocapture

# Test specific function
cargo test -p provisioning-orchestrator test_workflow_creation

# Run tests matching pattern
cargo test workflow

Security Tests

# Run 350+ security test cases
cargo test -p security --test '*'

# Specific security component
cargo test -p security authentication
cargo test -p security authorization
cargo test -p security kms

Code Quality

Formatting

# Format all Rust code
cargo fmt --all

# Check formatting without modifying
cargo fmt --all -- --check

# Format Nickel schemas
nickel fmt provisioning/schemas/**/*.ncl

Linting

# Run Clippy linter
cargo clippy --all -- -D warnings

# Auto-fix Clippy warnings
cargo clippy --all --fix

# Clippy with all features enabled
cargo clippy --all --all-features -- -D warnings

Nickel Validation

# Type check Nickel schemas
nickel typecheck provisioning/schemas/main.ncl

# Evaluate schema
nickel eval provisioning/schemas/main.ncl

# Format Nickel files
nickel fmt provisioning/schemas/**/*.ncl

Continuous Integration

The platform uses automated CI workflows for quality assurance.

GitHub Actions Pipeline

Key CI jobs:

1. Rust Build and Test
   - cargo build --release --workspace
   - cargo test --workspace
   - cargo clippy --all -- -D warnings

2. Nushell Validation
   - nu --check core/cli/provisioning
   - Run Nushell test suite

3. Nickel Schema Validation
   - nickel typecheck schemas/main.ncl
   - Validate all schema files

4. Security Tests
   - Run 350+ security test cases
   - Vulnerability scanning

5. Documentation Build
   - mdbook build docs
   - Markdown linting

Packaging and Distribution

Create Release Package

# Build optimized binaries
cargo build --release --workspace

# Strip debug symbols (reduce binary size)
strip target/release/provisioning-orchestrator
strip target/release/provisioning-control-center

# Create distribution archive
just package

Package Structure

provisioning-5.0.0-linux-x86_64.tar.gz
├── bin/
│   ├── provisioning                    # Main CLI
│   ├── provisioning-orchestrator       # Orchestrator service
│   ├── provisioning-control-center     # Control Center
│   ├── provisioning-vault-service      # Vault service
│   └── provisioning-installer          # Platform installer
├── lib/
│   └── nulib/                          # Nushell libraries
├── schemas/                            # Nickel schemas
├── config/
│   └── config.defaults.toml            # Default configuration
├── systemd/
│   └── *.service                       # Systemd unit files
└── README.md

Cross-Platform Builds

Supported Targets

# Linux x86_64 (primary platform)
cargo build --release --target x86_64-unknown-linux-gnu

# Linux ARM64 (Raspberry Pi, cloud ARM instances)
cargo build --release --target aarch64-unknown-linux-gnu

# macOS x86_64
cargo build --release --target x86_64-apple-darwin

# macOS ARM64 (Apple Silicon)
cargo build --release --target aarch64-apple-darwin

Cross-Compilation Setup

# Add target architectures
rustup target add x86_64-unknown-linux-gnu
rustup target add aarch64-unknown-linux-gnu

# Install cross-compilation tool
cargo install cross

# Cross-compile with Docker
cross build --release --target aarch64-unknown-linux-gnu

Just Task Runner

Common build tasks in justfile:

# Build all components
build-all: build-platform build-plugins

# Build platform services
build-platform:
    cd platform && cargo build --release --workspace

# Run all tests
test: test-rust test-nushell test-integration

# Test Rust code
test-rust:
    cargo test --workspace

# Test Nushell scripts
test-nushell:
    nu scripts/test/test_all.nu

# Format all code
fmt:
    cargo fmt --all
    nickel fmt schemas/**/*.ncl

# Lint all code
lint:
    cargo clippy --all -- -D warnings
    nickel typecheck schemas/main.ncl

# Create release package
package:
    ./scripts/package.nu

# Clean build artifacts
clean:
    cargo clean
    rm -rf target/

Usage examples:

just build-all     # Build everything
just test          # Run all tests
just fmt           # Format code
just lint          # Run linters
just package       # Create distribution
just clean         # Remove artifacts

Performance Optimization

Release Builds

# Cargo.toml
[profile.release]
opt-level = 3              # Maximum optimization
lto = "fat"                # Link-time optimization
codegen-units = 1          # Better optimization, slower compile
strip = true               # Strip debug symbols
panic = "abort"            # Smaller binary size

Build Time Optimization

# Cargo.toml
[profile.dev]
opt-level = 1              # Basic optimization
incremental = true         # Faster recompilation

Speed up compilation:

# Use faster linker (Linux)
sudo apt install lld
export RUSTFLAGS="-C link-arg=-fuse-ld=lld"

# Parallel compilation
cargo build -j 8

# Use cargo-watch for auto-rebuild
cargo install cargo-watch
cargo watch -x build

Development Workflow

# 1. Start development
just clean
just build-all

# 2. Make changes to code

# 3. Test changes quickly
cargo check                # Fast syntax check
cargo test <specific-test> # Test specific functionality

# 4. Full validation before commit
just fmt
just lint
just test

# 5. Create package for testing
just package

Hot Reload Development

# Auto-rebuild on file changes
cargo watch -x build

# Auto-test on changes
cargo watch -x test

# Run service with auto-reload
cargo watch -x 'run --bin provisioning-orchestrator'

Debugging Builds

Debug Information

# Build with full debug info
cargo build

# Build with debug info in release mode
cargo build --release --profile release-with-debug

# Run with backtraces
RUST_BACKTRACE=1 cargo run
RUST_BACKTRACE=full cargo run

Build Verbosity

# Verbose build output
cargo build -vv

# Show build commands
cargo build -vvv

# Show timing information
cargo build --timings

Dependency Tree

# View dependency tree
cargo tree

# Duplicate dependencies
cargo tree --duplicates

# Build graph visualization
cargo depgraph | dot -Tpng > deps.png

Best Practices

  • Always run just test before committing
  • Use cargo fmt and cargo clippy for code quality
  • Test on multiple platforms before release
  • Strip binaries for production distributions
  • Version binaries with semantic versioning
  • Cache dependencies in CI/CD
  • Use release profile for production builds
  • Document build requirements in README
  • Automate common tasks with Just
  • Keep build times reasonable (<5 min)

Troubleshooting

Common Build Issues

Compilation fails with linker error:

# Install build dependencies
sudo apt install build-essential pkg-config libssl-dev

Out of memory during build:

# Reduce parallel jobs
cargo build -j 2

# Use more swap space
sudo fallocate -l 8G /swapfile
sudo mkswap /swapfile
sudo swapon /swapfile

Clippy warnings:

# Fix automatically where possible
cargo clippy --all --fix

# Allow specific lints temporarily
#[allow(clippy::too_many_arguments)]
  • Testing - Testing strategies and procedures
  • Contributing - Contribution guidelines including build requirements

Testing

Comprehensive testing strategies for the Provisioning platform including unit tests, integration tests, and 350+ security tests.

Testing Overview

The platform maintains extensive test coverage across multiple test types:

Test TypeCountCoverage TargetAverage Runtime
Unit Tests200+Core logic 80%+< 5 seconds
Integration Tests100+Component integration 60%+< 30 seconds
Security Tests350+Security components 100%< 60 seconds
End-to-End Tests50+Full workflows< 5 minutes

Running Tests

All Tests

# Run complete test suite
cargo test --workspace

# With output visible
cargo test --workspace -- --nocapture

# Parallel execution with 8 threads
cargo test --workspace --jobs 8

# Include ignored tests
cargo test --workspace -- --ignored

Test by Category

# Unit tests only (--lib)
cargo test --lib

# Integration tests only (--test)
cargo test --test '*'

# Documentation tests
cargo test --doc

# Security test suite
cargo test -p security --test '*'

Test Specific Component

# Test orchestrator crate
cargo test -p provisioning-orchestrator

# Test control center
cargo test -p provisioning-control-center

# Test specific module
cargo test -p provisioning-orchestrator workflows::

# Test specific function
cargo test -p provisioning-orchestrator test_workflow_creation

Unit Testing

Unit tests verify individual functions and modules in isolation.

Rust Unit Tests

// src/workflows.rs
#[cfg(test)]
mod tests {
    use super::*;

    #[test]
    fn test_create_workflow() {
        let config = WorkflowConfig {
            name: "test-workflow".into(),
            tasks: vec![],
        };

        let workflow = Workflow::new(config);

        assert_eq!(workflow.name(), "test-workflow");
        assert_eq!(workflow.status(), WorkflowStatus::Pending);
    }

    #[test]
    fn test_workflow_execution() {
        let mut workflow = create_test_workflow();

        let result = workflow.execute();

        assert!(result.is_ok());
        assert_eq!(workflow.status(), WorkflowStatus::Completed);
    }

    #[test]
    #[should_panic(expected = "Invalid workflow")]
    fn test_invalid_workflow() {
        Workflow::new(invalid_config());
    }
}

Nushell Unit Tests

# tests/test_provider.nu
use std assert

export def test_validate_config [] {
    let config = {api_key: "test-key", region: "us-east-1"}
    let result = validate-config $config
    assert equal $result.valid true
}

export def test_create_server [] {
    let spec = {name: "test-server", plan: "medium"}
    let server = create-server test-config $spec
    assert ($server.id != null)
}

export def main [] {
    test_validate_config
    test_create_server
    print "All tests passed"
}

Run Nushell tests:

nu tests/test_provider.nu

Integration Testing

Integration tests verify components work together correctly.

Service Integration Tests

// tests/orchestrator_integration.rs
use provisioning_orchestrator::Orchestrator;
use provisioning_database::Database;

#[tokio::test]
async fn test_workflow_persistence() {
    let db = Database::new_test().await;
    let orchestrator = Orchestrator::new(db.clone());

    let workflow_id = orchestrator.create_workflow(test_config()).await.unwrap();

    // Verify workflow persisted to database
    let workflow = db.get_workflow(&workflow_id).await.unwrap();
    assert_eq!(workflow.id, workflow_id);
}

#[tokio::test]
async fn test_api_integration() {
    let app = create_test_app().await;

    let response = app
        .post("/api/v1/workflows")
        .json(&test_workflow())
        .send()
        .await
        .unwrap();

    assert_eq!(response.status(), 201);
}

Test Containers

Use Docker containers for realistic integration testing:

use testcontainers::*;

#[tokio::test]
async fn test_with_database() {
    let docker = clients::Cli::default();
    let postgres = docker.run(images::postgres::Postgres::default());

    let db_url = format!(
        "postgres://postgres@localhost:{}/test",
        postgres.get_host_port_ipv4(5432)
    );

    // Run tests against real database
    let db = Database::connect(&db_url).await.unwrap();
    // Test database operations...
}

Security Testing

Comprehensive security testing with 350+ test cases covering all security components.

Authentication Tests

#[tokio::test]
async fn test_jwt_verification() {
    let auth = AuthService::new();

    let token = auth.generate_token("user123").unwrap();
    let claims = auth.verify_token(&token).unwrap();

    assert_eq!(claims.sub, "user123");
}

#[tokio::test]
async fn test_invalid_token() {
    let auth = AuthService::new();
    let result = auth.verify_token("invalid.token.here");
    assert!(result.is_err());
}

#[tokio::test]
async fn test_token_expiration() {
    let auth = AuthService::new();
    let token = create_expired_token();
    let result = auth.verify_token(&token);
    assert!(matches!(result, Err(AuthError::TokenExpired)));
}

Authorization Tests

#[tokio::test]
async fn test_rbac_enforcement() {
    let authz = AuthorizationService::new();

    let decision = authz.authorize(
        "user:user123",
        "workflow:create",
        "resource:my-cluster"
    ).await;

    assert_eq!(decision, Decision::Allow);
}

#[tokio::test]
async fn test_policy_denial() {
    let authz = AuthorizationService::new();

    let decision = authz.authorize(
        "user:guest",
        "server:delete",
        "resource:prod-server"
    ).await;

    assert_eq!(decision, Decision::Deny);
}

Encryption Tests

#[tokio::test]
async fn test_kms_encryption() {
    let kms = KmsService::new();

    let plaintext = b"secret data";
    let ciphertext = kms.encrypt(plaintext).await.unwrap();
    let decrypted = kms.decrypt(&ciphertext).await.unwrap();

    assert_eq!(plaintext, decrypted.as_slice());
}

#[tokio::test]
async fn test_encryption_performance() {
    let kms = KmsService::new();
    let plaintext = vec![0u8; 1024]; // 1KB

    let start = Instant::now();
    kms.encrypt(&plaintext).await.unwrap();
    let duration = start.elapsed();

    // KMS encryption should complete in < 10ms
    assert!(duration < Duration::from_millis(10));
}

End-to-End Testing

Complete workflow testing from start to finish.

Full Workflow Tests

#[tokio::test]
async fn test_complete_workflow() {
    let platform = Platform::start_test_instance().await;

    // Create infrastructure
    let cluster_id = platform
        .create_cluster(test_cluster_config())
        .await
        .unwrap();

    // Wait for completion (5 minute timeout)
    platform
        .wait_for_cluster(&cluster_id, Duration::from_secs(300))
        .await;

    // Verify cluster health
    let health = platform.check_cluster_health(&cluster_id).await;
    assert!(health.is_healthy());

    // Cleanup
    platform.delete_cluster(&cluster_id).await.unwrap();
}

Test Fixtures

Shared test data and utilities.

Common Test Fixtures

// tests/fixtures/mod.rs
pub fn test_workflow_config() -> WorkflowConfig {
    WorkflowConfig {
        name: "test-workflow".into(),
        tasks: vec![
            Task::new("task1", TaskType::CreateServer),
            Task::new("task2", TaskType::InstallService),
        ],
    }
}

pub fn test_server_spec() -> ServerSpec {
    ServerSpec {
        name: "test-server".into(),
        plan: "medium".into(),
        zone: "us-east-1a".into(),
        image: "ubuntu-24.04".into(),
    }
}

Mocking

Mock external dependencies for isolated testing.

Mock External Services

use mockall::*;

#[automock]
trait CloudProvider {
    async fn create_server(&self, spec: &ServerSpec) -> Result<Server>;
}

#[tokio::test]
async fn test_with_mock_provider() {
    let mut mock_provider = MockCloudProvider::new();

    mock_provider
        .expect_create_server()
        .returning| ( | _ Ok(test_server()));

    let result = mock_provider.create_server(&test_spec()).await;
    assert!(result.is_ok());
}

Test Coverage

Measure and maintain code coverage.

Generate Coverage Report

# Install tarpaulin
cargo install cargo-tarpaulin

# Generate HTML coverage report
cargo tarpaulin --out Html --output-dir coverage

# Generate multiple formats
cargo tarpaulin --out Html --out Xml --out Json

# View coverage
open coverage/index.html

Coverage Goals

  • Unit tests: Minimum 80% code coverage
  • Integration tests: Minimum 60% component coverage
  • Critical paths: 100% coverage required
  • Security components: 100% coverage required

Performance Testing

Benchmark critical operations.

Benchmark Tests

use criterion::{black_box, criterion_group, criterion_main, Criterion};

fn benchmark_workflow_creation(c: &mut Criterion) {
    c.bench_function("create_workflow", | | b {
        b.iter| ( | {
            Workflow::new(black_box(test_config()))
        })
    });
}

fn benchmark_database_query(c: &mut Criterion) {
    c.bench_function("query_workflows", | | b {
        b.iter| ( | {
            db.query_workflows(black_box(&filter))
        })
    });
}

criterion_group!(benches, benchmark_workflow_creation, benchmark_database_query);
criterion_main!(benches);

Run benchmarks:

cargo bench

Test Best Practices

  • Write tests before or alongside code (TDD approach)
  • Keep tests focused and isolated
  • Use descriptive test names that explain what is tested
  • Clean up test resources (databases, files, containers)
  • Mock external dependencies to avoid flaky tests
  • Test both success and error conditions
  • Maintain shared test fixtures for consistency
  • Run tests in CI/CD pipeline
  • Monitor test execution time (fail if too slow)
  • Refactor tests alongside production code

Continuous Testing

Watch Mode

Auto-run tests on code changes:

# Install cargo-watch
cargo install cargo-watch

# Watch and run tests
cargo watch -x test

# Watch specific package
cargo watch -x 'test -p provisioning-orchestrator'

Pre-Commit Testing

Run tests automatically before commits:

# Install pre-commit hooks
pre-commit install

# Runs on every commit:
# - cargo test
# - cargo clippy
# - cargo fmt --check

Contributing

Guidelines for contributing to the Provisioning platform including setup, workflow, and best practices.

Getting Started

Prerequisites

Install required development tools:

# Rust toolchain (latest stable)
curl --proto '=https' --tlsv1.2 -sSf  [https://sh.rustup.rs](https://sh.rustup.rs) | sh

# Nushell shell
brew install nushell

# Nickel configuration language
brew install nickel

# Just task runner
brew install just

# Additional development tools
cargo install cargo-watch cargo-tarpaulin cargo-audit

Development Workflow

Follow these guidelines for all code changes and ensure adherence to the project’s technical standards.

  1. Read applicable language guidelines
  2. Create feature branch from main
  3. Make changes following project standards
  4. Write or update tests
  5. Run full test suite and linting
  6. Create pull request with clear description

Code Style Guidelines

Rust Code

Rust code guidelines:

  • Use idiomatic Rust patterns
  • No unwrap() in production code
  • Comprehensive error handling with custom error types
  • Format with cargo fmt
  • Pass cargo clippy -- -D warnings with zero warnings
  • Add inline documentation for public APIs

Nushell Scripts

Nushell code guidelines:

  • Use structured data pipelines
  • Avoid external command dependencies where possible
  • Handle errors gracefully with try-catch
  • Document functions with comments
  • Use type annotations for clarity

Nickel Schemas

Nickel configuration guidelines:

  • Define clear type constraints
  • Use lazy evaluation appropriately
  • Provide default values where sensible
  • Document schema fields
  • Validate schemas with nickel typecheck

Testing Requirements

All contributions must include appropriate tests:

Required Tests

  • Unit tests for all new functions
  • Integration tests for component interactions
  • Security tests for security-related changes
  • Documentation tests for code examples

Running Tests

# Run all tests
just test

# Run specific test suite
cargo test -p provisioning-orchestrator

# Run with coverage
cargo tarpaulin --out Html

Test Coverage Requirements

  • Unit tests: Minimum 80% code coverage
  • Critical paths: 100% coverage
  • Security components: 100% coverage

Documentation

Required Documentation

All code changes must include:

  • Inline code documentation for public APIs
  • Updated README if adding new components
  • Examples showing usage
  • Migration guide for breaking changes

Documentation Standards

Documentation standards:

  • Use Markdown for all documentation
  • Code blocks must specify language
  • Keep lines ≤150 characters
  • No bare URLs (use markdown links)
  • Test all code examples

Commit Message Format

Use conventional commit format:

<type>(<scope>): <subject>

<body>

<footer>

Types:

  • feat: New feature
  • fix: Bug fix
  • docs: Documentation changes
  • test: Adding or updating tests
  • refactor: Code refactoring
  • perf: Performance improvements
  • chore: Maintenance tasks

Example:

feat(orchestrator): add workflow retry mechanism

- Implement exponential backoff strategy
- Add max retry configuration option
- Update workflow state tracking

Closes #123

Pull Request Process

Before Creating PR

  1. Update your branch with latest main
  2. Run full test suite: just test
  3. Run linters: just lint
  4. Format code: just fmt
  5. Build successfully: just build-all

PR Description Template

## Description
Brief description of changes and motivation

## Type of Change
- [ ] Bug fix (non-breaking change fixing an issue)
- [ ] New feature (non-breaking change adding functionality)
- [ ] Breaking change (fix or feature causing existing functionality to change)
- [ ] Documentation update

## Testing
- [ ] Unit tests added or updated
- [ ] Integration tests pass
- [ ] Manual testing completed
- [ ] Test coverage maintained or improved

## Checklist
- [ ] Code follows project style guidelines
- [ ] Self-review completed
- [ ] Documentation updated
- [ ] No new compiler warnings
- [ ] Tested on relevant platforms

## Related Issues
Closes #<issue-number>

Code Review

All PRs require code review before merging. Reviewers check:

  • Correctness and quality of implementation
  • Test coverage and quality
  • Documentation completeness
  • Adherence to style guidelines
  • Security implications
  • Performance considerations
  • Breaking changes properly documented

Development Best Practices

Code Quality

  • Write self-documenting code with clear naming
  • Keep functions focused and single-purpose
  • Avoid premature optimization
  • Use meaningful variable and function names
  • Comment complex logic, not obvious code

Error Handling

  • Use custom error types, not strings
  • Provide context in error messages
  • Handle errors at appropriate level
  • Log errors with sufficient detail
  • Never ignore errors silently

Performance

  • Profile before optimizing
  • Use appropriate data structures
  • Minimize allocations in hot paths
  • Consider async for I/O-bound operations
  • Benchmark performance-critical code

Security

  • Validate all inputs
  • Never log sensitive data
  • Use constant-time comparisons for secrets
  • Follow principle of least privilege
  • Review security guidelines for security-related changes

Getting Help

Need assistance with contributions?

  1. Check existing documentation in docs/
  2. Search for similar closed issues and PRs
  3. Ask questions in GitHub Discussions
  4. Reach out to maintainers

Recognition

Contributors are recognized in:

  • CONTRIBUTORS.md file
  • Release notes for significant contributions
  • Project documentation acknowledgments

Thank you for contributing to the Provisioning platform!

Provisioning Logo

Provisioning

API Reference

Complete API documentation for the Provisioning platform, including REST endpoints, CLI commands, and library interfaces.

Available APIs

The Provisioning platform provides multiple API surfaces for different use cases and integration patterns.

REST API

HTTP-based APIs for external integration and programmatic access.

Command-Line Interface

Native CLI for interactive and scripted operations.

Nushell Libraries

Internal library APIs for extension development and customization.

API Categories

Infrastructure Management

Manage cloud resources, servers, and infrastructure components.

REST Endpoints:

  • Server Management - Create, delete, update, list servers
  • Provider Integration - Cloud provider operations
  • Network Configuration - Network, firewall, routing

CLI Commands:

  • provisioning server - Server lifecycle operations
  • provisioning provider - Provider configuration
  • provisioning infrastructure - Infrastructure queries

Service Orchestration

Deploy and manage infrastructure services and clusters.

REST Endpoints:

  • Task Service Deployment - Install, remove, update services
  • Cluster Management - Cluster lifecycle operations
  • Dependency Resolution - Automatic dependency handling

CLI Commands:

  • provisioning taskserv - Task service operations
  • provisioning cluster - Cluster management
  • provisioning workflow - Workflow execution

Workflow Automation

Execute batch operations and complex workflows.

REST Endpoints:

  • Workflow Submission - Submit and track workflows
  • Task Status - Real-time task monitoring
  • Checkpoint Recovery - Resume interrupted workflows

CLI Commands:

  • provisioning batch - Batch workflow operations
  • provisioning workflow - Workflow management
  • provisioning orchestrator - Orchestrator control

Configuration Management

Manage configuration across hierarchical layers.

REST Endpoints:

  • Configuration Retrieval - Get active configuration
  • Validation - Validate configuration files
  • Schema Queries - Query configuration schemas

CLI Commands:

  • provisioning config - Configuration operations
  • provisioning validate - Validation commands
  • provisioning schema - Schema management

Security & Authentication

Manage authentication, authorization, secrets, and encryption.

REST Endpoints:

  • Authentication - Login, token management, MFA
  • Authorization - Policy evaluation, permissions
  • Secrets Management - Secret storage and retrieval
  • KMS Operations - Key management and encryption
  • Audit Logging - Security event tracking

CLI Commands:

  • provisioning auth - Authentication operations
  • provisioning vault - Secret management
  • provisioning kms - Key management
  • provisioning audit - Audit log queries

Platform Services

Control platform components and system health.

REST Endpoints:

  • Service Health - Health checks and status
  • Service Control - Start, stop, restart services
  • Configuration - Service configuration management
  • Monitoring - Metrics and performance data

CLI Commands:

  • provisioning platform - Platform management
  • provisioning service - Service control
  • provisioning health - Health monitoring

API Conventions

REST API Standards

All REST endpoints follow consistent conventions:

Authentication:

Authorization: Bearer <jwt-token>

Request Format:

Content-Type: application/json

Response Format:

{
  "status": "succes| s error",
  "data": { ... },
  "message": "Human-readable message",
  "timestamp": "2026-01-16T10:30:00Z"
}

Error Responses:

{
  "status": "error",
  "error": {
    "code": "ERR_CODE",
    "message": "Error description",
    "details": { ... }
  },
  "timestamp": "2026-01-16T10:30:00Z"
}

CLI Command Patterns

All CLI commands follow consistent patterns:

Common Flags:

  • --yes - Skip confirmation prompts
  • --check - Dry-run mode, show what would happen
  • --wait - Wait for operation completion
  • --format jso| n yam| l table - Output format
  • --verbose - Detailed output
  • --quiet - Minimal output

Command Structure:

provisioning <domain> <action> <resource> [flags]

Examples:

provisioning server create web-01 --plan medium --yes
provisioning taskserv install kubernetes --cluster prod
provisioning workflow submit deploy.ncl --wait

Library Function Signatures

Nushell library functions follow consistent signatures:

Parameter Order:

  1. Required positional parameters
  2. Optional positional parameters
  3. Named parameters (flags)

Return Values:

  • Success: Returns data structure (record, table, list)
  • Error: Throws error with structured message

Example:

def create-server [
  name: string           # Required: server name
  --plan: string = "medium"  # Optional: server plan
  --wait                 # Optional: wait flag
] {
  # Implementation
}

API Versioning

The Provisioning platform uses semantic versioning for APIs:

  • Major version - Breaking changes to API contracts
  • Minor version - Backwards-compatible additions
  • Patch version - Backwards-compatible bug fixes

Current API Version: v1.0.0

Version Compatibility:

  • REST API includes version in URL: /api/v1/servers
  • CLI maintains backwards compatibility across minor versions
  • Libraries use semantic import versioning

Rate Limiting

REST API endpoints implement rate limiting to ensure platform stability:

  • Default Limit: 100 requests per minute per API key
  • Burst Limit: 20 requests per second
  • Headers: Rate limit information in response headers
X-RateLimit-Limit: 100
X-RateLimit-Remaining: 95
X-RateLimit-Reset: 1642334400

Authentication

All APIs require authentication except public health endpoints.

Supported Methods:

  • JWT Tokens - Primary authentication method
  • API Keys - For service-to-service integration
  • MFA - Multi-factor authentication for sensitive operations

Token Management:

# Login and obtain token
provisioning auth login --user admin

# Use token in requests
curl -H "Authorization: Bearer $TOKEN"  [https://api/v1/servers](https://api/v1/servers)

See Authentication Guide for complete details.

API Discovery

Discover available APIs programmatically:

REST API:

# Get API specification (OpenAPI)
curl  [https://api/v1/openapi.json](https://api/v1/openapi.json)

CLI:

# List all commands
provisioning help --all

# Get command details
provisioning server help

Libraries:

# List available modules
use lib_provisioning *
$nu.scope.commands | where is_custom

Next Steps

REST API Reference

Complete HTTP API documentation for the Provisioning platform covering 83+ endpoints across 9 platform services.

Base URL

 [https://api.provisioning.local/api/v1](https://api.provisioning.local/api/v1)

All endpoints are prefixed with /api/v1 for version compatibility.

Authentication

All API requests require authentication using JWT Bearer tokens:

Authorization: Bearer <your-jwt-token>

Obtain tokens via the Authentication API endpoints.

Common Response Format

All responses follow a consistent structure:

Success Response:

{
  "status": "success",
  "data": { ... },
  "message": "Operation completed successfully",
  "timestamp": "2026-01-16T10:30:00Z"
}

Error Response:

{
  "status": "error",
  "error": {
    "code": "ERR_CODE",
    "message": "Human-readable error message",
    "details": { ... }
  },
  "timestamp": "2026-01-16T10:30:00Z"
}

HTTP Status Codes

CodeMeaningUsage
200OKSuccessful GET, PUT, PATCH requests
201CreatedSuccessful POST request creating resource
202AcceptedAsync operation accepted, check status
204No ContentSuccessful DELETE request
400Bad RequestInvalid request parameters
401UnauthorizedMissing or invalid authentication
403ForbiddenValid auth but insufficient permissions
404Not FoundResource does not exist
409ConflictResource conflict (duplicate name, etc.)
429Too Many RequestsRate limit exceeded
500Internal Server ErrorServer error
503Service UnavailableService temporarily unavailable

API Services

The platform exposes 9 distinct services with REST APIs:

  1. Orchestrator - Workflow execution and task management
  2. Control Center - Platform management and monitoring
  3. Extension Registry - Extension distribution
  4. Auth Service - Authentication and identity
  5. Vault Service - Secrets management
  6. KMS Service - Key management and encryption
  7. Audit Service - Audit logging and compliance
  8. Policy Service - Authorization policies
  9. Gateway Service - API gateway and routing

Orchestrator API

Workflow execution, task scheduling, and state management.

Base Path: /api/v1/orchestrator

Submit Workflow

Submit a new workflow for execution.

Endpoint: POST /workflows

Request:

{
  "name": "deploy-cluster",
  "type": "cluster",
  "operations": [
    {
      "id": "create-servers",
      "type": "server",
      "action": "create",
      "params": {
        "infra": "my-cluster.ncl"
      }
    },
    {
      "id": "install-k8s",
      "type": "taskserv",
      "action": "install",
      "params": {
        "name": "kubernetes"
      },
      "dependencies": ["create-servers"]
    }
  ],
  "priority": "normal",
  "checkpoint_enabled": true
}

Response:

{
  "status": "success",
  "data": {
    "workflow_id": "wf-20260116-abc123",
    "state": "queued",
    "created_at": "2026-01-16T10:30:00Z"
  }
}

Get Workflow Status

Retrieve workflow execution status.

Endpoint: GET /workflows/{workflow_id}

Response:

{
  "status": "success",
  "data": {
    "workflow_id": "wf-20260116-abc123",
    "name": "deploy-cluster",
    "state": "running",
    "progress": {
      "total_tasks": 2,
      "completed": 1,
      "failed": 0,
      "running": 1
    },
    "current_task": {
      "id": "install-k8s",
      "state": "running",
      "started_at": "2026-01-16T10:32:00Z"
    },
    "created_at": "2026-01-16T10:30:00Z",
    "updated_at": "2026-01-16T10:32:15Z"
  }
}

List Workflows

List all workflows with optional filtering.

Endpoint: GET /workflows

Query Parameters:

  • state (optional) - Filter by state: queue| d runnin| g complete| d failed
  • limit (optional) - Maximum results (default: 50, max: 100)
  • offset (optional) - Pagination offset

Response:

{
  "status": "success",
  "data": {
    "workflows": [
      {
        "workflow_id": "wf-20260116-abc123",
        "name": "deploy-cluster",
        "state": "running",
        "created_at": "2026-01-16T10:30:00Z"
      }
    ],
    "total": 1,
    "limit": 50,
    "offset": 0
  }
}

Cancel Workflow

Cancel a running workflow.

Endpoint: POST /workflows/{workflow_id}/cancel

Response:

{
  "status": "success",
  "data": {
    "workflow_id": "wf-20260116-abc123",
    "state": "cancelled",
    "cancelled_at": "2026-01-16T10:35:00Z"
  }
}

Get Task Logs

Retrieve logs for a specific task in a workflow.

Endpoint: GET /workflows/{workflow_id}/tasks/{task_id}/logs

Query Parameters:

  • lines (optional) - Number of lines (default: 100)
  • follow (optional) - Stream logs (SSE)

Response:

{
  "status": "success",
  "data": {
    "task_id": "install-k8s",
    "logs": [
      {
        "timestamp": "2026-01-16T10:32:00Z",
        "level": "info",
        "message": "Starting Kubernetes installation"
      },
      {
        "timestamp": "2026-01-16T10:32:15Z",
        "level": "info",
        "message": "Downloading Kubernetes binaries"
      }
    ]
  }
}

Resume Workflow

Resume a failed workflow from checkpoint.

Endpoint: POST /workflows/{workflow_id}/resume

Request:

{
  "from_checkpoint": "create-servers",
  "skip_failed": false
}

Response:

{
  "status": "success",
  "data": {
    "workflow_id": "wf-20260116-abc123",
    "state": "running",
    "resumed_at": "2026-01-16T10:40:00Z"
  }
}

Control Center API

Platform management, service control, and monitoring.

Base Path: /api/v1/control-center

List Services

List all platform services and their status.

Endpoint: GET /services

Response:

{
  "status": "success",
  "data": {
    "services": [
      {
        "name": "orchestrator",
        "state": "running",
        "health": "healthy",
        "uptime": 86400,
        "version": "1.0.0"
      },
      {
        "name": "control-center",
        "state": "running",
        "health": "healthy",
        "uptime": 86400,
        "version": "1.0.0"
      }
    ]
  }
}

Get Service Health

Check health status of a specific service.

Endpoint: GET /services/{service_name}/health

Response:

{
  "status": "success",
  "data": {
    "service": "orchestrator",
    "health": "healthy",
    "checks": {
      "api": "pass",
      "database": "pass",
      "storage": "pass"
    },
    "timestamp": "2026-01-16T10:30:00Z"
  }
}

Start Service

Start a stopped platform service.

Endpoint: POST /services/{service_name}/start

Response:

{
  "status": "success",
  "data": {
    "service": "orchestrator",
    "state": "starting",
    "message": "Service start initiated"
  }
}

Stop Service

Gracefully stop a running service.

Endpoint: POST /services/{service_name}/stop

Request:

{
  "force": false,
  "timeout": 30
}

Response:

{
  "status": "success",
  "data": {
    "service": "orchestrator",
    "state": "stopped",
    "message": "Service stopped gracefully"
  }
}

Restart Service

Restart a platform service.

Endpoint: POST /services/{service_name}/restart

Response:

{
  "status": "success",
  "data": {
    "service": "orchestrator",
    "state": "restarting",
    "message": "Service restart initiated"
  }
}

Get Service Configuration

Retrieve service configuration.

Endpoint: GET /services/{service_name}/config

Response:

{
  "status": "success",
  "data": {
    "service": "orchestrator",
    "config": {
      "port": 8080,
      "max_workers": 10,
      "checkpoint_enabled": true
    }
  }
}

Update Service Configuration

Update service configuration (requires restart).

Endpoint: PUT /services/{service_name}/config

Request:

{
  "config": {
    "max_workers": 20
  },
  "restart": true
}

Response:

{
  "status": "success",
  "data": {
    "service": "orchestrator",
    "config_updated": true,
    "restart_required": true
  }
}

Get Platform Metrics

Retrieve platform-wide metrics.

Endpoint: GET /metrics

Response:

{
  "status": "success",
  "data": {
    "platform": {
      "uptime": 86400,
      "version": "1.0.0"
    },
    "resources": {
      "cpu_usage": 45.2,
      "memory_usage": 62.8,
      "disk_usage": 38.1
    },
    "workflows": {
      "total": 150,
      "running": 5,
      "queued": 2,
      "failed": 3
    },
    "timestamp": "2026-01-16T10:30:00Z"
  }
}

Extension Registry API

Extension distribution, versioning, and discovery.

Base Path: /api/v1/registry

List Extensions

List available extensions.

Endpoint: GET /extensions

Query Parameters:

  • type (optional) - Filter by type: provide| r taskser| v cluste| r workflow
  • search (optional) - Search by name or description

Response:

{
  "status": "success",
  "data": {
    "extensions": [
      {
        "name": "kubernetes",
        "type": "taskserv",
        "version": "1.29.0",
        "description": "Kubernetes orchestration platform",
        "dependencies": ["containerd", "etcd"]
      }
    ],
    "total": 1
  }
}

Get Extension Details

Get detailed information about an extension.

Endpoint: GET /extensions/{extension_name}

Response:

{
  "status": "success",
  "data": {
    "name": "kubernetes",
    "type": "taskserv",
    "version": "1.29.0",
    "description": "Kubernetes orchestration platform",
    "dependencies": ["containerd", "etcd"],
    "versions": ["1.29.0", "1.28.5", "1.27.10"],
    "metadata": {
      "author": "Provisioning Team",
      "license": "Apache-2.0",
      "homepage": " [https://kubernetes.io"](https://kubernetes.io")
    }
  }
}

Download Extension

Download an extension package.

Endpoint: GET /extensions/{extension_name}/download

Query Parameters:

  • version (optional) - Specific version (default: latest)

Response: Binary OCI image blob

Publish Extension

Publish a new extension or version.

Endpoint: POST /extensions

Request: Multipart form data with OCI image

Response:

{
  "status": "success",
  "data": {
    "name": "kubernetes",
    "version": "1.29.0",
    "published_at": "2026-01-16T10:30:00Z"
  }
}

Auth Service API

Authentication, identity management, and MFA.

Base Path: /api/v1/auth

Login

Authenticate user and obtain JWT token.

Endpoint: POST /login

Request:

{
  "username": "admin",
  "password": "secure-password"
}

Response:

{
  "status": "success",
  "data": {
    "token": "eyJhbGciOiJSUzI1NiIsInR5cCI6IkpXVCJ9...",
    "refresh_token": "refresh-token-abc123",
    "expires_in": 3600,
    "user": {
      "id": "user-123",
      "username": "admin",
      "roles": ["admin"]
    }
  }
}

MFA Challenge

Request MFA challenge for two-factor authentication.

Endpoint: POST /mfa/challenge

Request:

{
  "username": "admin",
  "password": "secure-password"
}

Response:

{
  "status": "success",
  "data": {
    "challenge_id": "challenge-abc123",
    "methods": ["totp", "webauthn"],
    "expires_in": 300
  }
}

MFA Verify

Verify MFA code and complete authentication.

Endpoint: POST /mfa/verify

Request:

{
  "challenge_id": "challenge-abc123",
  "method": "totp",
  "code": "123456"
}

Response:

{
  "status": "success",
  "data": {
    "token": "eyJhbGciOiJSUzI1NiIsInR5cCI6IkpXVCJ9...",
    "refresh_token": "refresh-token-abc123",
    "expires_in": 3600
  }
}

Refresh Token

Obtain new access token using refresh token.

Endpoint: POST /refresh

Request:

{
  "refresh_token": "refresh-token-abc123"
}

Response:

{
  "status": "success",
  "data": {
    "token": "eyJhbGciOiJSUzI1NiIsInR5cCI6IkpXVCJ9...",
    "expires_in": 3600
  }
}

Logout

Invalidate current session and tokens.

Endpoint: POST /logout

Request:

{
  "refresh_token": "refresh-token-abc123"
}

Response:

{
  "status": "success",
  "message": "Logged out successfully"
}

Create User

Create a new user account (admin only).

Endpoint: POST /users

Request:

{
  "username": "developer",
  "email": "[dev@example.com](mailto:dev@example.com)",
  "password": "secure-password",
  "roles": ["developer"]
}

Response:

{
  "status": "success",
  "data": {
    "user_id": "user-456",
    "username": "developer",
    "created_at": "2026-01-16T10:30:00Z"
  }
}

List Users

List all users (admin only).

Endpoint: GET /users

Response:

{
  "status": "success",
  "data": {
    "users": [
      {
        "user_id": "user-123",
        "username": "admin",
        "email": "[admin@example.com](mailto:admin@example.com)",
        "roles": ["admin"],
        "created_at": "2026-01-01T00:00:00Z"
      }
    ],
    "total": 1
  }
}

Vault Service API

Secrets management and dynamic credentials.

Base Path: /api/v1/vault

Store Secret

Store a new secret.

Endpoint: POST /secrets

Request:

{
  "path": "database/postgres/password",
  "data": {
    "username": "dbuser",
    "password": "db-password"
  },
  "metadata": {
    "description": "PostgreSQL credentials"
  }
}

Response:

{
  "status": "success",
  "data": {
    "path": "database/postgres/password",
    "version": 1,
    "created_at": "2026-01-16T10:30:00Z"
  }
}

Retrieve Secret

Retrieve a stored secret.

Endpoint: GET /secrets/{path}

Query Parameters:

  • version (optional) - Specific version (default: latest)

Response:

{
  "status": "success",
  "data": {
    "path": "database/postgres/password",
    "version": 1,
    "data": {
      "username": "dbuser",
      "password": "db-password"
    },
    "metadata": {
      "description": "PostgreSQL credentials"
    },
    "created_at": "2026-01-16T10:30:00Z"
  }
}

List Secrets

List all secret paths.

Endpoint: GET /secrets

Query Parameters:

  • prefix (optional) - Filter by path prefix

Response:

{
  "status": "success",
  "data": {
    "secrets": [
      {
        "path": "database/postgres/password",
        "versions": 1,
        "updated_at": "2026-01-16T10:30:00Z"
      }
    ],
    "total": 1
  }
}

Delete Secret

Delete a secret (soft delete, preserves versions).

Endpoint: DELETE /secrets/{path}

Response:

{
  "status": "success",
  "message": "Secret deleted successfully"
}

Generate Dynamic Credentials

Generate temporary credentials for supported backends.

Endpoint: POST /dynamic/{backend}/generate

Request:

{
  "role": "readonly",
  "ttl": 3600
}

Response:

{
  "status": "success",
  "data": {
    "credentials": {
      "username": "v-readonly-abc123",
      "password": "temporary-password"
    },
    "ttl": 3600,
    "expires_at": "2026-01-16T11:30:00Z"
  }
}

KMS Service API

Key management, encryption, and decryption.

Base Path: /api/v1/kms

Encrypt Data

Encrypt data using a managed key.

Endpoint: POST /encrypt

Request:

{
  "key_id": "master-key-01",
  "plaintext": "sensitive data",
  "context": {
    "purpose": "config-encryption"
  }
}

Response:

{
  "status": "success",
  "data": {
    "ciphertext": "AQICAHh...",
    "key_id": "master-key-01"
  }
}

Decrypt Data

Decrypt previously encrypted data.

Endpoint: POST /decrypt

Request:

{
  "ciphertext": "AQICAHh...",
  "context": {
    "purpose": "config-encryption"
  }
}

Response:

{
  "status": "success",
  "data": {
    "plaintext": "sensitive data",
    "key_id": "master-key-01"
  }
}

Create Key

Create a new encryption key.

Endpoint: POST /keys

Request:

{
  "key_id": "app-key-01",
  "algorithm": "AES-256-GCM",
  "metadata": {
    "description": "Application encryption key"
  }
}

Response:

{
  "status": "success",
  "data": {
    "key_id": "app-key-01",
    "algorithm": "AES-256-GCM",
    "created_at": "2026-01-16T10:30:00Z"
  }
}

List Keys

List all encryption keys.

Endpoint: GET /keys

Response:

{
  "status": "success",
  "data": {
    "keys": [
      {
        "key_id": "master-key-01",
        "algorithm": "AES-256-GCM",
        "state": "enabled",
        "created_at": "2026-01-01T00:00:00Z"
      }
    ],
    "total": 1
  }
}

Rotate Key

Rotate an encryption key.

Endpoint: POST /keys/{key_id}/rotate

Response:

{
  "status": "success",
  "data": {
    "key_id": "master-key-01",
    "version": 2,
    "rotated_at": "2026-01-16T10:30:00Z"
  }
}

Audit Service API

Audit logging, compliance tracking, and event queries.

Base Path: /api/v1/audit

Query Audit Logs

Query audit events with filtering.

Endpoint: GET /logs

Query Parameters:

  • user (optional) - Filter by user ID
  • action (optional) - Filter by action type
  • resource (optional) - Filter by resource type
  • start_time (optional) - Start timestamp
  • end_time (optional) - End timestamp
  • limit (optional) - Maximum results (default: 100)

Response:

{
  "status": "success",
  "data": {
    "events": [
      {
        "event_id": "evt-abc123",
        "timestamp": "2026-01-16T10:30:00Z",
        "user": "admin",
        "action": "workflow.submit",
        "resource": "wf-20260116-abc123",
        "result": "success",
        "metadata": {
          "workflow_name": "deploy-cluster"
        }
      }
    ],
    "total": 1
  }
}

Export Audit Logs

Export audit logs in various formats.

Endpoint: GET /export

Query Parameters:

  • format - Export format: jso| n cs| v syslo| g ce| f splunk
  • start_time - Start timestamp
  • end_time - End timestamp

Response: File download in requested format

Get Compliance Report

Generate compliance report for specific period.

Endpoint: GET /compliance

Query Parameters:

  • standard - Compliance standard: gdp| r soc| 2 iso27001
  • start_time - Report start time
  • end_time - Report end time

Response:

{
  "status": "success",
  "data": {
    "standard": "soc2",
    "period": {
      "start": "2026-01-01T00:00:00Z",
      "end": "2026-01-16T23:59:59Z"
    },
    "controls": [
      {
        "control_id": "CC6.1",
        "status": "compliant",
        "evidence_count": 150
      }
    ],
    "summary": {
      "total_controls": 10,
      "compliant": 9,
      "non_compliant": 1
    }
  }
}

Policy Service API

Authorization policy management (Cedar policies).

Base Path: /api/v1/policy

Evaluate Policy

Evaluate authorization request against policies.

Endpoint: POST /evaluate

Request:

{
  "principal": "User::\"admin\"",
  "action": "Action::\"workflow.submit\"",
  "resource": "Workflow::\"deploy-cluster\"",
  "context": {
    "time": "2026-01-16T10:30:00Z"
  }
}

Response:

{
  "status": "success",
  "data": {
    "decision": "allow",
    "policies": ["admin-full-access"],
    "diagnostics": {
      "reason": "User has admin role"
    }
  }
}

Create Policy

Create a new authorization policy.

Endpoint: POST /policies

Request:

{
  "policy_id": "developer-read-only",
  "content": "permit(principal in Role::\"developer\", action == Action::\"read\", resource);",
  "description": "Developers have read-only access"
}

Response:

{
  "status": "success",
  "data": {
    "policy_id": "developer-read-only",
    "created_at": "2026-01-16T10:30:00Z"
  }
}

List Policies

List all authorization policies.

Endpoint: GET /policies

Response:

{
  "status": "success",
  "data": {
    "policies": [
      {
        "policy_id": "admin-full-access",
        "description": "Admins have full access",
        "created_at": "2026-01-01T00:00:00Z"
      }
    ],
    "total": 1
  }
}

Update Policy

Update an existing policy (hot reload).

Endpoint: PUT /policies/{policy_id}

Request:

{
  "content": "permit(principal in Role::\"developer\", action == Action::\"read\", resource);"
}

Response:

{
  "status": "success",
  "data": {
    "policy_id": "developer-read-only",
    "updated_at": "2026-01-16T10:30:00Z",
    "reloaded": true
  }
}

Delete Policy

Delete an authorization policy.

Endpoint: DELETE /policies/{policy_id}

Response:

{
  "status": "success",
  "message": "Policy deleted successfully"
}

Gateway Service API

API gateway, routing, and rate limiting.

Base Path: /api/v1/gateway

Get Route Configuration

Retrieve current routing configuration.

Endpoint: GET /routes

Response:

{
  "status": "success",
  "data": {
    "routes": [
      {
        "path": "/api/v1/orchestrator/*",
        "target": " [http://orchestrator:8080",](http://orchestrator:8080",)
        "methods": ["GET", "POST", "PUT", "DELETE"],
        "auth_required": true
      }
    ]
  }
}

Update Routes

Update gateway routing (hot reload).

Endpoint: PUT /routes

Request:

{
  "routes": [
    {
      "path": "/api/v1/custom/*",
      "target": " [http://custom-service:9000",](http://custom-service:9000",)
      "methods": ["GET", "POST"],
      "auth_required": true
    }
  ]
}

Response:

{
  "status": "success",
  "message": "Routes updated successfully"
}

Get Rate Limits

Retrieve rate limiting configuration.

Endpoint: GET /rate-limits

Response:

{
  "status": "success",
  "data": {
    "global": {
      "requests_per_minute": 100,
      "burst": 20
    },
    "per_user": {
      "requests_per_minute": 60,
      "burst": 10
    }
  }
}

Error Codes

Common error codes returned by the API:

CodeDescription
ERR_AUTH_INVALIDInvalid authentication credentials
ERR_AUTH_EXPIREDToken expired
ERR_AUTH_MFA_REQUIREDMFA verification required
ERR_FORBIDDENInsufficient permissions
ERR_NOT_FOUNDResource not found
ERR_CONFLICTResource conflict
ERR_VALIDATIONInvalid request parameters
ERR_RATE_LIMITRate limit exceeded
ERR_WORKFLOW_FAILEDWorkflow execution failed
ERR_SERVICE_UNAVAILABLEService temporarily unavailable
ERR_INTERNALInternal server error

Rate Limiting Headers

All responses include rate limiting headers:

X-RateLimit-Limit: 100
X-RateLimit-Remaining: 95
X-RateLimit-Reset: 1642334400
X-RateLimit-Retry-After: 60

Pagination

List endpoints support pagination using offset-based pagination:

Request:

GET /api/v1/workflows?limit=50&offset=100

Response includes:

{
  "data": { ... },
  "total": 500,
  "limit": 50,
  "offset": 100,
  "has_more": true
}

Webhooks

Platform supports webhook notifications for async operations:

Webhook Payload:

{
  "event": "workflow.completed",
  "timestamp": "2026-01-16T10:30:00Z",
  "data": {
    "workflow_id": "wf-20260116-abc123",
    "state": "completed"
  },
  "signature": "sha256=abc123..."
}

Configure webhooks via Control Center API.

CLI Commands Reference

Complete command-line interface documentation for the Provisioning platform covering 111+ commands across 11 domain modules.

Command Structure

All commands follow the pattern:

provisioning <domain> <action> [resource] [flags]

Common Flags (available on most commands):

  • --yes - Skip confirmation prompts (auto-yes)
  • --check - Dry-run mode, show what would happen without executing
  • --wait - Wait for async operations to complete
  • --format <jso| n yam| l table> - Output format (default: table)
  • --verbose - Detailed output with debug information
  • --quiet - Minimal output, errors only
  • --help - Show command help

Quick Reference

Shortcuts - Single-letter aliases for common domains:

provisioning s = provisioning server
provisioning t = provisioning taskserv
provisioning c = provisioning cluster
provisioning w = provisioning workspace
provisioning cfg = provisioning config
provisioning b = provisioning batch

Help Navigation - Bi-directional help system:

provisioning help server = provisioning server help
provisioning help ws = provisioning workspace help

Domain Modules

The CLI is organized into 11 domain modules:

  1. Infrastructure - Server, provider, network management
  2. Orchestration - Workflow, batch, task execution
  3. Configuration - Config validation and management
  4. Workspace - Multi-workspace operations
  5. Development - Extensions and customization
  6. Utilities - Tools and helpers
  7. Generation - Schema and config generation
  8. Authentication - Auth, MFA, users
  9. Security - Vault, KMS, audit, policies
  10. Platform - Service control and monitoring
  11. Guides - Interactive documentation

Infrastructure Commands

Manage cloud infrastructure, servers, and resources.

Server Commands

provisioning server create [NAME]

Create a new server or servers from infrastructure configuration.

Flags:

  • --infra <file> - Nickel infrastructure file
  • --plan <size> - Server plan (small/medium/large/xlarge)
  • --provider <name> - Cloud provider (upcloud/aws/local)
  • --zone <name> - Availability zone
  • --ssh-key <path> - SSH public key path
  • --tags <key=value> - Server tags (repeatable)
  • --yes - Skip confirmation
  • --check - Dry-run mode
  • --wait - Wait for server creation

Examples:

# Create server from infrastructure file
provisioning server create --infra my-cluster.ncl --yes --wait

# Create single server interactively
provisioning server create web-01 --plan medium --provider upcloud

# Check what would be created (dry-run)
provisioning server create --infra cluster.ncl --check

provisioning server delete [NAM| E ID]

Delete servers.

Flags:

  • --all - Delete all servers in current infrastructure
  • --force - Force deletion without cleanup
  • --yes - Skip confirmation

Examples:

# Delete specific server
provisioning server delete web-01 --yes

# Delete all servers
provisioning server delete --all --yes

provisioning server list

List all servers in the current workspace.

Flags:

  • --provider <name> - Filter by provider
  • --state <state> - Filter by state (running/stopped/error)
  • --format <format> - Output format

Examples:

# List all servers
provisioning server list

# List only running servers
provisioning server list --state running --format json

provisioning server status [NAM| E ID]

Get detailed server status.

Examples:

provisioning server status web-01
provisioning server status --all

provisioning server ssh [NAM| E ID]

SSH into a server.

Examples:

provisioning server ssh web-01
provisioning server ssh web-01 -- "systemctl status kubelet"

Provider Commands

provisioning provider list

List available cloud providers.

provisioning provider validate <NAME>

Validate provider configuration and credentials.

Examples:

provisioning provider validate upcloud
provisioning provider validate aws

provisioning provider zones <NAME>

List available zones for a provider.

Examples:

provisioning provider zones upcloud
provisioning provider zones aws --region us-east-1

Orchestration Commands

Execute workflows, batch operations, and manage tasks.

Workflow Commands

provisioning workflow submit <FILE>

Submit a workflow for execution.

Flags:

  • --priority <level> - Priority (low/normal/high/critical)
  • --checkpoint - Enable checkpoint recovery
  • --wait - Wait for completion

Examples:

# Submit workflow and wait
provisioning workflow submit deploy.ncl --wait

# Submit with high priority
provisioning workflow submit urgent.ncl --priority high

provisioning workflow status <ID>

Get workflow execution status.

Examples:

provisioning workflow status wf-20260116-abc123

provisioning workflow list

List workflows.

Flags:

  • --state <state> - Filter by state (queued/running/completed/failed)
  • --limit <num> - Maximum results

Examples:

# List running workflows
provisioning workflow list --state running

# List failed workflows
provisioning workflow list --state failed --format json

provisioning workflow cancel <ID>

Cancel a running workflow.

Examples:

provisioning workflow cancel wf-20260116-abc123 --yes

provisioning workflow resume <ID>

Resume a failed workflow from checkpoint.

Flags:

  • --from <checkpoint> - Resume from specific checkpoint
  • --skip-failed - Skip failed tasks

Examples:

# Resume from last checkpoint
provisioning workflow resume wf-20260116-abc123

# Resume from specific checkpoint
provisioning workflow resume wf-20260116-abc123 --from create-servers

provisioning workflow logs <ID>

View workflow logs.

Flags:

  • --task <id> - Show logs for specific task
  • --follow - Stream logs in real-time
  • --lines <num> - Number of lines (default: 100)

Examples:

# View all workflow logs
provisioning workflow logs wf-20260116-abc123

# Follow logs in real-time
provisioning workflow logs wf-20260116-abc123 --follow

# View specific task logs
provisioning workflow logs wf-20260116-abc123 --task install-k8s

Batch Commands

provisioning batch submit <FILE>

Submit a batch workflow with multiple operations.

Flags:

  • --parallel <num> - Maximum parallel operations
  • --wait - Wait for completion

Examples:

# Submit batch workflow
provisioning batch submit multi-region.ncl --parallel 3 --wait

provisioning batch status <ID>

Get batch workflow status with progress.

provisioning batch monitor <ID>

Monitor batch execution in real-time.

Configuration Commands

Validate and manage configuration.

provisioning config validate

Validate current configuration.

Flags:

  • --infra <file> - Specific infrastructure file
  • --all - Validate all configuration files

Examples:

# Validate workspace configuration
provisioning config validate

# Validate specific infrastructure
provisioning config validate --infra cluster.ncl

provisioning config show

Display effective configuration.

Flags:

  • --key <path> - Show specific config value
  • --format <format> - Output format

Examples:

# Show all configuration
provisioning config show

# Show specific value
provisioning config show --key paths.base

# Export as JSON
provisioning config show --format json > config.json

provisioning config reload

Reload configuration from files.

provisioning config diff

Show configuration differences between environments.

Flags:

  • --from <env> - Source environment
  • --to <env> - Target environment

Workspace Commands

Manage isolated workspaces.

provisioning workspace init <NAME>

Initialize a new workspace.

Flags:

  • --template <name> - Workspace template
  • --path <path> - Custom workspace path

Examples:

# Create workspace from default template
provisioning workspace init my-project

# Create from template
provisioning workspace init prod --template production

provisioning workspace switch <NAME>

Switch to a different workspace.

Examples:

provisioning workspace switch production
provisioning workspace switch dev

provisioning workspace list

List all workspaces.

Flags:

  • --format <format> - Output format

Examples:

provisioning workspace list
provisioning workspace list --format json

provisioning workspace current

Show current active workspace.

provisioning workspace delete <NAME>

Delete a workspace.

Flags:

  • --force - Force deletion without cleanup
  • --yes - Skip confirmation

Development Commands

Develop custom extensions.

provisioning extension create <TYPE> <NAME>

Create a new extension.

Types: provider, taskserv, cluster, workflow

Flags:

  • --template <name> - Extension template

Examples:

# Create new task service
provisioning extension create taskserv my-service

# Create new provider
provisioning extension create provider my-cloud --template basic

provisioning extension validate <PATH>

Validate extension structure and configuration.

provisioning extension package <PATH>

Package extension for distribution (OCI format).

Flags:

  • --version <version> - Extension version
  • --output <path> - Output file path

Examples:

provisioning extension package ./my-service --version 1.0.0

provisioning extension install <NAM| E PATH>

Install an extension from registry or file.

Examples:

# Install from registry
provisioning extension install kubernetes

# Install from local file
provisioning extension install ./my-service.tar.gz

provisioning extension list

List installed extensions.

Flags:

  • --type <type> - Filter by type
  • --available - Show available (not installed)

Utility Commands

Helper commands and tools.

provisioning version

Show platform version information.

Flags:

  • --check - Check for updates

Examples:

provisioning version
provisioning version --check

provisioning health

Check platform health.

Flags:

  • --service <name> - Check specific service

Examples:

# Check all services
provisioning health

# Check specific service
provisioning health --service orchestrator

provisioning diagnostics

Run platform diagnostics.

Flags:

  • --output <path> - Save diagnostic report

Examples:

provisioning diagnostics --output diagnostics.json

provisioning setup versions

Generate versions file from Nickel schemas.

Examples:

# Generate /provisioning/core/versions file
provisioning setup versions

# Use in shell scripts
source /provisioning/core/versions
echo "Nushell version: $NU_VERSION"

Generation Commands

Generate schemas, configurations, and infrastructure code.

provisioning generate config <TYPE>

Generate configuration templates.

Types: workspace, infrastructure, provider

Flags:

  • --output <path> - Output file path
  • --format <format> - Output format (nickel/yaml/toml)

Examples:

# Generate workspace config
provisioning generate config workspace --output config.ncl

# Generate infrastructure template
provisioning generate config infrastructure --format nickel

provisioning generate schema <NAME>

Generate Nickel schema from existing configuration.

provisioning generate docs

Generate documentation from schemas.

Authentication Commands

Manage authentication and user accounts.

provisioning auth login

Authenticate to the platform.

Flags:

  • --user <username> - Username
  • --password <password> - Password (prompt if not provided)
  • --mfa <code> - MFA code

Examples:

# Interactive login
provisioning auth login --user admin

# Login with MFA
provisioning auth login --user admin --mfa 123456

provisioning auth logout

Logout and invalidate tokens.

provisioning auth token

Display or refresh authentication token.

Flags:

  • --refresh - Refresh the token

provisioning auth user create <USERNAME>

Create a new user (admin only).

Flags:

  • --email <email> - User email
  • --roles <roles> - Comma-separated roles

Examples:

provisioning auth user create developer --email [dev@example.com](mailto:dev@example.com) --roles developer,operator

provisioning auth user list

List all users (admin only).

provisioning auth user delete <USERNAME>

Delete a user (admin only).

Security Commands

Manage secrets, encryption, audit logs, and policies.

Vault Commands

provisioning vault store <PATH>

Store a secret.

Flags:

  • --value <value> - Secret value
  • --file <path> - Read value from file

Examples:

# Store secret interactively
provisioning vault store database/postgres/password

# Store from value
provisioning vault store api/key --value "secret-value"

# Store from file
provisioning vault store ssh/key --file ~/.ssh/id_rsa

provisioning vault get <PATH>

Retrieve a secret.

Flags:

  • --version <num> - Specific version
  • --output <path> - Save to file

Examples:

# Get latest secret
provisioning vault get database/postgres/password

# Get specific version
provisioning vault get database/postgres/password --version 2

provisioning vault list

List all secret paths.

Flags:

  • --prefix <prefix> - Filter by path prefix

provisioning vault delete <PATH>

Delete a secret.

KMS Commands

provisioning kms encrypt <FILE>

Encrypt a file or data.

Flags:

  • --key <id> - Key ID
  • --output <path> - Output file

Examples:

# Encrypt file
provisioning kms encrypt config.yaml --key master-key --output config.enc

# Encrypt string
echo "sensitive data" | provisioning kms encrypt --key master-key

provisioning kms decrypt <FILE>

Decrypt encrypted data.

Flags:

  • --output <path> - Output file

provisioning kms create-key <ID>

Create a new encryption key.

Flags:

  • --algorithm <algo> - Algorithm (default: AES-256-GCM)

provisioning kms list-keys

List all encryption keys.

provisioning kms rotate-key <ID>

Rotate an encryption key.

Audit Commands

provisioning audit query

Query audit logs.

Flags:

  • --user <user> - Filter by user
  • --action <action> - Filter by action
  • --resource <resource> - Filter by resource
  • --start <time> - Start time
  • --end <time> - End time
  • --limit <num> - Maximum results

Examples:

# Query recent audit logs
provisioning audit query --limit 100

# Query specific user actions
provisioning audit query --user admin --action workflow.submit

# Query time range
provisioning audit query --start "2026-01-15" --end "2026-01-16"

provisioning audit export

Export audit logs.

Flags:

  • --format <format> - Export format (json/csv/syslog/cef/splunk)
  • --start <time> - Start time
  • --end <time> - End time
  • --output <path> - Output file

Examples:

# Export as JSON
provisioning audit export --format json --output audit.json

# Export last 7 days as CSV
provisioning audit export --format csv --start "7 days ago" --output audit.csv

provisioning audit compliance

Generate compliance report.

Flags:

  • --standard <standard> - Compliance standard (gdpr/soc2/iso27001)
  • --start <time> - Report start time
  • --end <time> - Report end time

Policy Commands

provisioning policy create <ID>

Create an authorization policy.

Flags:

  • --content <cedar> - Cedar policy content
  • --file <path> - Load from file
  • --description <text> - Policy description

Examples:

# Create from file
provisioning policy create developer-read --file policies/read-only.cedar

# Create inline
provisioning policy create admin-full --content "permit(principal in Role::\"admin\", action, resource);"

provisioning policy list

List all authorization policies.

provisioning policy evaluate

Evaluate a policy decision.

Flags:

  • --principal <entity> - Principal entity
  • --action <action> - Action
  • --resource <resource> - Resource

Examples:

provisioning policy evaluate \
  --principal "User::\"admin\"" \
  --action "Action::\"workflow.submit\"" \
  --resource "Workflow::\"deploy\""

provisioning policy update <ID>

Update an existing policy (hot reload).

provisioning policy delete <ID>

Delete an authorization policy.

Platform Commands

Control platform services.

provisioning platform service list

List all platform services and status.

provisioning platform service start <NAME>

Start a platform service.

Examples:

provisioning platform service start orchestrator

provisioning platform service stop <NAME>

Stop a platform service.

Flags:

  • --force - Force stop without graceful shutdown
  • --timeout <seconds> - Graceful shutdown timeout

provisioning platform service restart <NAME>

Restart a platform service.

provisioning platform service health <NAME>

Check service health.

provisioning platform metrics

Display platform-wide metrics.

Flags:

  • --watch - Continuously update metrics

Guides Commands

Access interactive guides and documentation.

provisioning guide from-scratch

Complete walkthrough from installation to first deployment.

provisioning guide update

Guide for updating the platform.

provisioning guide customize

Guide for customizing extensions.

provisioning sc

Quick reference shortcut guide (fastest).

provisioning help [COMMAND]

Display help for any command.

Examples:

# General help
provisioning help

# Command-specific help
provisioning help server create
provisioning server create --help  # Same result

Task Service Commands

provisioning taskserv install <NAME>

Install a task service on servers.

Flags:

  • --cluster <name> - Target cluster
  • --version <version> - Specific version
  • --servers <names> - Target servers (comma-separated)
  • --wait - Wait for installation
  • --yes - Skip confirmation

Examples:

# Install Kubernetes on cluster
provisioning taskserv install kubernetes --cluster prod --wait

# Install specific version
provisioning taskserv install kubernetes --version 1.29.0

# Install on specific servers
provisioning taskserv install containerd --servers web-01,web-02

provisioning taskserv remove <NAME>

Remove a task service.

Flags:

  • --cluster <name> - Target cluster
  • --purge - Remove all data
  • --yes - Skip confirmation

provisioning taskserv list

List installed task services.

Flags:

  • --available - Show available (not installed) services

provisioning taskserv status <NAME>

Get task service status.

Examples:

provisioning taskserv status kubernetes

Cluster Commands

provisioning cluster create <NAME>

Create a complete cluster from configuration.

Flags:

  • --infra <file> - Nickel infrastructure file
  • --type <type> - Cluster type (kubernetes/etcd/postgres)
  • --wait - Wait for creation
  • --yes - Skip confirmation
  • --check - Dry-run mode

Examples:

# Create Kubernetes cluster
provisioning cluster create prod-k8s --infra k8s-cluster.ncl --wait

# Check what would be created
provisioning cluster create staging --infra staging.ncl --check

provisioning cluster delete <NAME>

Delete a cluster and all resources.

Flags:

  • --keep-data - Preserve data volumes
  • --yes - Skip confirmation

provisioning cluster list

List all clusters.

provisioning cluster status <NAME>

Get detailed cluster status.

Examples:

provisioning cluster status prod-k8s

provisioning cluster scale <NAME>

Scale cluster nodes.

Flags:

  • --workers <num> - Number of worker nodes
  • --control-plane <num> - Number of control plane nodes

Examples:

# Scale workers to 5 nodes
provisioning cluster scale prod-k8s --workers 5

Test Commands

provisioning test quick <TASKSERV>

Quick test of a task service in container.

Examples:

provisioning test quick kubernetes
provisioning test quick postgres

provisioning test topology load <NAME>

Load a test topology template.

provisioning test env create

Create a test environment.

Flags:

  • --topology <name> - Topology template
  • --services <names> - Services to install

provisioning test env list

List active test environments.

provisioning test env cleanup <ID>

Cleanup a test environment.

Environment Variables

The CLI respects these environment variables:

  • PROVISIONING_WORKSPACE - Override active workspace
  • PROVISIONING_CONFIG - Custom config file path
  • PROVISIONING_LOG_LEVEL - Log level (debug/info/warn/error)
  • PROVISIONING_API_URL - API endpoint URL
  • PROVISIONING_TOKEN - Auth token (overrides login)

Exit Codes

CodeMeaning
0Success
1General error
2Invalid usage
3Configuration error
4Authentication error
5Permission denied
6Resource not found
7Operation failed
8Timeout

Shell Completion

Generate shell completion scripts:

# Bash
provisioning completion bash > /etc/bash_completion.d/provisioning

# Zsh
provisioning completion zsh > ~/.zsh/completion/_provisioning

# Fish
provisioning completion fish > ~/.config/fish/completions/provisioning.fish

Nushell Libraries

Orchestrator API

Control Center API

Examples

Provisioning Logo

Provisioning

Architecture

Deep dive into Provisioning platform architecture, design principles, and architectural decisions that shape the system.

Overview

The Provisioning platform uses modular, microservice-based architecture for enterprise infrastructure as code across multiple clouds. This section documents foundational architectural decisions and system design that enable:

  • Multi-cloud orchestration across AWS, UpCloud, Hetzner, Kubernetes, and on-premise systems
  • Workspace-first organization with complete infrastructure isolation and multi-tenancy support
  • Type-safe configuration using Nickel language as source of truth
  • Autonomous operations through intelligent detectors and automated incident response
  • Post-quantum security with hybrid encryption protecting against future threats

Architecture Documentation

System Understanding

System Architecture Overview with 12 Microservices

  • System Overview - Platform architecture with 12 microservices, 80+ CLI commands, multi-tenancy model, cloud integration

  • Design Principles - Configuration-driven design, workspace isolation, type-safety mandates, autonomous operations, security-first

  • Component Architecture - 12 microservices: Orchestrator, Control-Center, Vault-Service, Extension-Registry, AI-Service, Detector, RAG, MCP-Server, KMS, Platform-Config, Service-Clients

  • Integration Patterns - REST APIs, async message queues, event-driven workflows, service discovery, state management

Microservices Communication Patterns REST Async Events

Architectural Decisions

  • Architecture Decision Records (ADRs) - 10 decisions: modular CLI, workspace-first design, Nickel type-safety, microservice distribution, communication, post-quantum cryptography, encryption, observability, SLO management, incident automation

Key Architectural Patterns

Modular Design (ADR-001)

  • Decentralized CLI command registration reducing code by 84%
  • Dynamic command discovery and 80+ keyboard shortcuts
  • Extensible architecture supporting custom commands

Workspace-First Organization (ADR-002)

  • Workspaces as primary organizational unit grouping infrastructure, configs, and state
  • Complete isolation for multi-tenancy and team collaboration
  • Local schema and extension customization per workspace

Type-Safe Configuration (ADR-003)

  • Nickel language as source of truth for all infrastructure definitions
  • Mandatory schema validation at parse time (not runtime)
  • Complete migration from KCL with backward compatibility

Distributed Microservices (ADR-004)

  • 12 specialized microservices handling specific domains
  • Independent scaling and deployment per service
  • Service communication via REST + async queues

Security Architecture (ADR-006 & ADR-007)

  • Post-quantum cryptography with CRYSTALS-Kyber hybrid encryption
  • Multi-layer encryption: at-rest (KMS), in-transit (TLS 1.3), field-level, end-to-end
  • Centralized secrets management via SecretumVault

Observability & Resilience (ADR-008, ADR-009, ADR-010)

  • Unified observability: Prometheus metrics, ELK logging, Jaeger tracing
  • SLO-driven operations with error budget enforcement
  • Autonomous incident detection and self-healing
  • For implementation details → See provisioning/docs/src/features/
  • For API documentation → See provisioning/docs/src/api-reference/
  • For deployment guides → See provisioning/docs/src/operations/
  • For security details → See provisioning/docs/src/security/
  • For development → See provisioning/docs/src/development/

System Overview

Complete architecture of the Provisioning Infrastructure Automation Platform.

Architecture Layers

Provisioning uses a 5-layer modular architecture:

┌─────────────────────────────────────────────────────────────┐
│ User Interface Layer                                        │
│ • CLI (provisioning command)  • Web Control Center (UI)     │
│ • REST API  • MCP Server (AI) • Batch Scheduler             │
└──────────────────────────┬──────────────────────────────────┘
                           ↓
┌─────────────────────────────────────────────────────────────┐
│ Core Engine Layer (provisioning/core/)                      │
│ • 211-line CLI dispatcher (84% code reduction)              │
│ • 476+ configuration accessors (hierarchical)               │
│ • Provider abstraction (multi-cloud support)                │
│ • Workspace management system                               │
│ • Infrastructure validation (54+ Nushell libraries)         │
│ • Secrets management (SOPS + Age integration)               │
└──────────────────────────┬──────────────────────────────────┘
                           ↓
┌─────────────────────────────────────────────────────────────┐
│ Orchestration Layer (provisioning/platform/)                │
│ • Hybrid Orchestrator (Rust + Nushell)                      │
│ • Workflow execution with checkpoints                       │
│ • Dependency resolver & task scheduler                      │
│ • File-based persistence                                    │
│ • REST API endpoints (83+)                                  │
│ • State management (SurrealDB)                              │
└──────────────────────────┬──────────────────────────────────┘
                           ↓
┌─────────────────────────────────────────────────────────────┐
│ Extension Layer (provisioning/extensions/)                  │
│ • Cloud Providers (UpCloud, AWS, Hetzner, Local)            │
│ • Task Services (50+ services in 18 categories)             │
│ • Clusters (9 pre-built cluster templates)                  │
│ • Batch Workflows (automation templates)                    │
│ • Nushell Plugins (10-50x performance gains)                │
└──────────────────────────┬──────────────────────────────────┘
                           ↓
┌─────────────────────────────────────────────────────────────┐
│ Infrastructure Layer                                        │
│ • Cloud Resources (servers, networks, storage)              │
│ • Running Services (Kubernetes, databases, etc.)            │
│ • State Persistence (SurrealDB, file storage)               │
│ • Monitoring & Logging (Prometheus, Loki)                   │
└─────────────────────────────────────────────────────────────┘

Core System Components

1. CLI Layer (provisioning/core/cli/)

Entry Point: provisioning/core/cli/provisioning

  • Bash wrapper (210 lines) - Minimal bootstrap
  • Routes commands to Nushell dispatcher
  • Loads environment and validates workspace
  • Handles error reporting

Key Features:

  • Single entry point
  • Pluggable architecture
  • Support for 111+ commands
  • 80+ shortcuts for productivity

2. Core Engine (provisioning/core/nulib/)

Structure: 54 Nushell libraries organized by function

Main Components:

Configuration Management (lib_provisioning/config/)

  • Hierarchical loading: 5-layer precedence system
  • 476+ accessors: Type-safe configuration access
  • Variable interpolation: Template expansion
  • TOML merging: Environment-specific overrides
  • Validation: Schema enforcement

Provider Abstraction (lib_provisioning/providers/)

  • Multi-cloud support: UpCloud, AWS, Hetzner, Local
  • Unified interface: Single API for all providers
  • Dynamic loading: Load providers on-demand
  • Credential management: Encrypted credential handling
  • State tracking: Provider-specific state persistence

Workspace Management (lib_provisioning/workspace/)

  • Workspace registry: Track all workspaces
  • Switching: Atomic workspace transitions
  • Isolation: Independent state per workspace
  • Configuration loading: Workspace-specific overrides
  • Extensions: Inherit from platform extensions

Infrastructure Validation (lib_provisioning/infra_validator/)

  • Schema validation: Nickel contract checking
  • Constraint enforcement: Business rule validation
  • Dependency analysis: Infrastructure dependency graph
  • Type checking: Static type validation
  • Error reporting: Detailed error messages with suggestions

Secrets Management (lib_provisioning/secrets/)

  • SOPS integration: Mozilla SOPS for encryption
  • Age encryption: Modern symmetric encryption
  • KMS backends: Cosmian, AWS KMS, local
  • Credential injection: Runtime variable substitution
  • Audit logging: Track secret access

Command Utilities (lib_provisioning/cmd/)

  • SSH operations: Remote command execution
  • Batch operations: Parallel command execution
  • Error handling: Structured error reporting
  • Logging: Comprehensive operation logging
  • Retry logic: Automatic retry with backoff

3. Orchestration Engine (provisioning/platform/)

Technology: Rust + Nushell hybrid

12 Microservices (Rust crates):

ServicePurposeKey Features
orchestratorWorkflow executionScheduler, file persistence, REST API
control-centerAPI gateway + authRBAC, Cedar policies, audit logging
control-center-uiWeb dashboardInfrastructure view, config management
mcp-serverAI integrationModel Context Protocol, auto-completion
vault-serviceSecrets storageEncryption, KMS, credential injection
extension-registryOCI registryExtension distribution, versioning
ai-serviceLLM featuresPrompt optimization, context awareness
detectorAnomaly detectionHealth monitoring, pattern recognition
ragKnowledge retrievalDocument embedding, semantic search
provisioning-daemonBackground serviceEvent monitoring, task scheduling
platform-configConfig managementSchema validation, environment handling
service-clientsAPI clientsSDK for platform services, cloud APIs

Detailed Services:

Orchestrator (crates/orchestrator/)

  • High-performance scheduler: Rust core
  • File-based persistence: Durable queue
  • Workflow execution: Dependency-aware scheduling
  • Checkpoint recovery: Resume from failures
  • Parallel execution: Multi-task handling
  • State management: Track job status
  • REST API: 9 core endpoints
  • Port: 9090 (health check endpoint)

Control Center (crates/control-center/)

  • Authorization engine: Cedar policy enforcement
  • RBAC system: Role-based access control
  • Audit logging: Complete audit trail
  • API gateway: REST API for all operations
  • System configuration: Central configuration management
  • Health monitoring: Real-time system status

Control Center UI (crates/control-center-ui/)

  • Web dashboard: Real-time infrastructure view
  • Workflow visualization: Batch job monitoring
  • Configuration management: Web-based configuration
  • Resource explorer: Browse infrastructure
  • Audit viewer: Security audit trail

MCP Server (crates/mcp-server/)

  • AI integration: Model Context Protocol support
  • Natural language: Parse infrastructure requests
  • Auto-completion: Intelligent configuration suggestions
  • 7 settings tools: Configuration management via LLM
  • Context-aware: Understand workspace context

Vault Service (crates/vault-service/)

  • Secrets backend: Encrypted credential storage
  • KMS integration: Key Management System support
  • SOPS + Age: SOPS encryption backend
  • Credential injection: Secure credential delivery
  • Audit logging: Secret access tracking

Extension Registry (crates/extension-registry/)

  • OCI distribution: Container image distribution
  • Extension packaging: Provider/taskserv distribution
  • Version management: Semantic versioning
  • Registry API: Content addressable storage

AI Service (crates/ai-service/)

  • LLM integration: Large Language Model support
  • Prompt optimization: Infrastructure request parsing
  • Context awareness: Workspace context enrichment
  • Response generation: Configuration suggestions

Detector (crates/detector/)

  • Anomaly detection: System health monitoring
  • Pattern recognition: Infrastructure issue identification
  • Alert generation: Alerting system integration
  • Real-time monitoring: Continuous surveillance

Platform Config (crates/platform-config/)

  • Configuration management: Centralized config loading
  • Schema validation: Configuration validation
  • Environment handling: Multi-environment support
  • Default settings: System-wide defaults

Provisioning Daemon (crates/provisioning-daemon/)

  • Background service: Continuous operation
  • Event monitoring: System event handling
  • Task scheduling: Background job execution
  • State synchronization: Infrastructure state sync

RAG Service (crates/rag/)

  • Retrieval Augmented Generation: Knowledge base integration
  • Document embedding: Semantic search
  • Context retrieval: Intelligent response context
  • Knowledge synthesis: Answer generation

Service Clients (crates/service-clients/)

  • API clients: Client SDK for platform services
  • Cloud providers: Multi-cloud provider SDKs
  • Request handling: HTTP/RPC client utilities
  • Connection pooling: Efficient resource management

4. Extensions (provisioning/extensions/)

Modular infrastructure components:

Providers (5 cloud providers)

  • UpCloud - Primary European cloud
  • AWS - Amazon Web Services
  • Hetzner - Baremetal & cloud servers
  • Local - Development environment
  • Demo - Testing & mocking

Each provider includes:

  • Nickel schemas for configuration
  • API client implementation
  • Server creation/deletion logic
  • Network management
  • State tracking

Task Services (50+ services in 18 categories)

CategoryServicesPurpose
Container Runtimecontainerd, crio, podman, crun, youki, runcContainer execution
Kuberneteskubernetes, etcd, coredns, cilium, flannel, calicoOrchestration
Storagerook-ceph, local-storage, mayastor, external-nfsData persistence
Databasespostgres, redis, mysql, mongodbData management
Networkingip-aliases, proxy, resolv, kmsNetwork services
Securitywebhook, kms, oras, radicleSecurity services
Observabilityprometheus, grafana, loki, jaegerMonitoring & logging
Developmentgitea, coder, desktop, buildkitDeveloper tools
Hypervisorkvm, qemu, libvirtVirtualization

Clusters (9 pre-built templates)

  • web - Web service cluster (nginx + postgres)
  • oci-reg - Container registry
  • git - Git hosting (Gitea)
  • buildkit - Build infrastructure
  • k8s-ha - HA Kubernetes (3 control planes)
  • postgresql - HA PostgreSQL cluster
  • cicd-argocd - GitOps CI/CD
  • cicd-tekton - Tekton pipelines

5. Infrastructure Layer

What Provisioning Manages:

  • Cloud Resources: VMs, networks, storage
  • Services: Kubernetes, databases, monitoring
  • Applications: Web services, APIs, tools
  • State: Configuration, data, logs
  • Monitoring: Metrics, traces, logs

Configuration System

Hierarchical 5-Layer System:

Precedence (High → Low):

1. Runtime Arguments   (CLI flags: --provider upcloud)
   ↓
2. Environment Variables (PROVISIONING_PROVIDER=aws)
   ↓
3. Workspace Config    (~workspace/config/provisioning.yaml)
   ↓
4. Environment Defaults (workspace/config/prod-defaults.toml)
   ↓
5. System Defaults     (~/.config/provisioning/ + platform defaults)

Configuration Languages:

FormatPurposeValidationEditability
NickelInfrastructure source✅ Type-safe, contractsDirect
TOMLSettings, defaultsSchema validationDirect
YAMLUser config, metadataSchema validationDirect
JSONExported configsSchema validationGenerated

Key Features:

  • Lazy evaluation
  • Recursive merging
  • Variable interpolation
  • Constraint checking
  • Automatic validation

State Management

SurrealDB Graph Database:

Stores complex infrastructure relationships:

Nodes:
- Servers (compute)
- Networks (connectivity)
- Storage (persistence)
- Services (software)
- Workflows (automation)

Edges:
- Server → Network (connected)
- Server → Storage (mounted)
- Service → Server (running on)
- Workflow → Dependency (depends on)

File-Based Persistence:

For orchestrator queue and checkpoints:

~/.provisioning/
├── state/              # Infrastructure state
├── checkpoints/        # Workflow checkpoints
├── queue/              # Orchestrator queue
└── logs/               # Operational logs

Security Architecture

4-Layer Security Model:

LayerComponentsFeatures
AuthenticationJWT, sessions, MFA2FA, TOTP, WebAuthn
AuthorizationCedar policies, RBACFine-grained permissions
EncryptionAES-256-GCM, TLSAt-rest & in-transit
AuditLogging, compliance7-year retention

Security Services:

  • JWT token validation
  • Argon2id password hashing
  • Multi-factor authentication
  • Cedar policy enforcement
  • Encrypted credential storage
  • KMS integration (5 backends)
  • Audit logging (5 export formats)
  • Compliance checking (SOC2, GDPR, HIPAA)

Performance Characteristics

Modular CLI (84% code reduction):

  • Main CLI: 211 lines (vs. 1,329 before)
  • Command discovery: O(1) dispatcher
  • Lazy loading: Commands loaded on-demand
  • Caching: Configuration cached after first load

Orchestrator Performance:

  • Dependency resolution: O(n log n) topological sort
  • Parallel execution: Configurable task limit
  • Checkpoint recovery: Resume from failure point
  • Memory efficient: File-based queue

Provider Operations:

  • Batch creation: Parallel server provisioning
  • Bulk operations: Multi-resource transactions
  • State tracking: Efficient state queries
  • Rollback: Atomic operation reversal

Nushell Plugins (10-50x speedup):

  • Compiled Rust extensions
  • Direct native code execution
  • Zero-copy data passing
  • Async I/O support

Deployment Modes

Three Operational Modes:

ModeInteractionConfigurationRollbackUse Case
Interactive TUIRatatui UIManual inputAutomaticDevelopment
Headless CLICommand-lineScript-drivenManualAutomation
Unattended CI/CDNon-interactiveConfiguration fileAutomaticCI/CD pipelines

Technology Stack

ComponentTechnologyWhy
IaC LanguageNickelType-safe, lazy evaluation, contracts
ScriptingNushell 0.109+Structured data pipelines
PerformanceRustZero-cost abstractions, memory safety
StateSurrealDBGraph database for relationships
EncryptionSOPS + AgeIndustry-standard encryption
SecurityCedar + JWTPolicy enforcement + tokens
OrchestrationCustomSpecialized for infrastructure workflows

File Organization

provisioning/
├── core/                       # CLI engine (Nushell)
│   ├── cli/provisioning       # Main entry point
│   ├── nulib/                 # 54 core libraries
│   ├── plugins/               # Nushell plugins (Rust)
│   └── scripts/               # Utility scripts
│
├── platform/                   # Microservices (Rust)
│   ├── crates/                # 12 microservices
│   │   ├── orchestrator/      # Workflow scheduler
│   │   ├── control-center/    # API gateway + auth
│   │   ├── control-center-ui/ # Web dashboard
│   │   ├── mcp-server/        # AI integration
│   │   ├── vault-service/     # Secrets backend
│   │   ├── extension-registry/ # OCI registry
│   │   ├── ai-service/        # LLM features
│   │   ├── detector/          # Anomaly detection
│   │   ├── rag/               # Knowledge retrieval
│   │   ├── provisioning-daemon/ # Background service
│   │   ├── platform-config/   # Config management
│   │   └── service-clients/   # API clients
│   └── Cargo.toml             # Rust workspace
│
├── extensions/                # Extensible components
│   ├── providers/             # Cloud providers (5)
│   ├── taskservs/             # Task services (50+)
│   ├── clusters/              # Cluster templates (9)
│   └── workflows/             # Automation templates
│
├── schemas/                   # Nickel schemas
│   ├── main.ncl              # Entry point
│   ├── config/               # Configuration schemas
│   ├── infrastructure/       # Infrastructure schemas
│   ├── operations/           # Operational schemas
│   └── [other schemas]       # Additional schemas
│
├── config/                    # System configuration
│   └── config.defaults.toml  # Default settings
│
├── bootstrap/                 # Installation
│   ├── install.sh            # Bash bootstrap
│   └── install.nu            # Nushell installer
│
├── docs/                      # Product documentation
│   └── src/                  # mdBook source
│
└── README.md                  # Project overview

Component Interaction

Typical Workflow:

User Input
   ↓
CLI Dispatcher (provisioning/core/cli/provisioning)
   ↓
Nushell Handler (provisioning/core/nulib/commands/)
   ↓
Configuration Loading (lib_provisioning/config/)
   ↓
Provider Selection (lib_provisioning/providers/)
   ↓
Validation (lib_provisioning/infra_validator/)
   ↓
Orchestrator Queue (provisioning/platform/orchestrator/)
   ↓
Task Execution (provider + task service)
   ↓
State Update (SurrealDB / file storage)
   ↓
Audit Logging (security system)
   ↓
User Feedback

Scalability

Provisioning scales from:

  • Solo: 2 CPU cores, 4GB RAM (single instance)
  • MultiUser: 4-8 CPU cores, 8GB RAM (small team)
  • CICD: 8+ CPU cores, 16GB RAM (enterprise)
  • Enterprise: Multi-node Kubernetes (unlimited)

Bottlenecks & Solutions:

ComponentBottleneckSolution
OrchestratorTask queuePartition by workspace
StateSurrealDBHorizontal scaling
ProvidersAPI rate limitsExponential backoff
StorageDisk I/OSSD + caching

Integration Points

Provisioning integrates with:

  • Kubernetes API - Cluster management
  • Cloud Provider APIs - Resource provisioning
  • SOPS + Age - Secrets encryption
  • Prometheus - Metrics collection
  • Cedar - Policy enforcement
  • SurrealDB - State persistence
  • MCP - AI integration
  • KMS - Key management (Cosmian, AWS, local)

Reliability Features

Fault Tolerance:

  • Checkpoint recovery - Resume from failure
  • Automatic rollback - Revert failed operations
  • Retry logic - Exponential backoff
  • Health checks - Continuous monitoring
  • Backup & restore - Data protection

High Availability:

  • Multi-node orchestrator
  • Database replication
  • Service redundancy
  • Load balancing
  • Failover automation

Design Principles

Core principles guiding Provisioning architecture and development.

1. Workspace-First Design

Principle: Workspaces are the default organizational unit for ALL infrastructure work.

Why:

  • Explicit project isolation
  • Prevent accidental cross-project modifications
  • Independent credential management
  • Clear configuration boundaries
  • Team collaboration enablement

Application:

  • Every workspace has independent state
  • Workspace switching is atomic
  • Configuration per workspace
  • Extensions inherited from platform

Code Example:

# Workspace-enforced workflow
provisioning workspace init my-project
provisioning workspace switch my-project

# This command requires active workspace
provisioning server create --name web-01

Impact: All commands validate active workspace before execution.


2. Type-Safety Mandatory

Principle: ALL configurations MUST be type-safe. Validation is NEVER optional.

Why:

  • Catch errors at configuration time
  • Prevent runtime failures
  • Enable IDE support (LSP)
  • Enforce consistency
  • Reduce deployment risk

Application:

  • Nickel is source of truth (NOT TOML)
  • Type contracts on ALL schemas
  • Gradual typing not allowed
  • Validation in ALL profiles (dev, prod, cicd)
  • Static analysis before deployment

Code Example:

# Type-safe infrastructure definition
{
  name : String = "server-01"
  plan : | [ 'small, 'medium, 'large | ] = 'medium
  zone : String = "de-fra1"
  backup_enabled : Bool = false
} | ServerContract

Impact: Type errors caught before infrastructure changes.


3. Configuration-Driven, Never Hardcoded

Principle: Configuration is the source of truth. Hardcoded values are forbidden.

Why:

  • Enable environment-specific behavior
  • Support multiple deployment modes
  • Allow runtime reconfiguration
  • Audit configuration changes
  • Team collaboration

Application:

  • 5-layer configuration hierarchy
  • 476+ configuration accessors
  • Variable interpolation
  • Environment-specific overrides
  • Schema validation

Code Example:

# Configuration drives behavior
provisioning server create --plan $(config.server.default_plan)

# Environment-specific configs
PROVISIONING_ENV=prod provisioning server create

Forbidden:

# ❌ WRONG - Hardcoded values
let server_plan = "medium"

# ✅ RIGHT - Configuration-driven
let server_plan = (config.server.plan)

Impact: Single codebase supports all environments.


4. Multi-Cloud Abstraction

Principle: Provider-agnostic interfaces enable multi-cloud deployments.

Why:

  • Avoid vendor lock-in
  • Reuse infrastructure code
  • Support multiple cloud strategies
  • Easy provider switching

Application:

  • Unified provider interface
  • Abstract resource definitions
  • Provider-specific implementation
  • Automatic provider selection

Code Example:

# Provider-agnostic configuration
{
  servers = [
    {
      name = "web-01"
      plan = "medium"      # Abstract plan size
      provider = "upcloud" # Swappable provider
    }
  ]
}

Impact: Same Nickel schema deploys to UpCloud, AWS, or Hetzner.


5. Modular, Extensible Architecture

Principle: Components are loosely coupled, independently deployable.

Why:

  • Easy to add features
  • Support custom extensions
  • Avoid monolithic growth
  • Enable community contributions
  • Flexible deployment options

Application:

  • 54 core Nushell libraries
  • 111+ CLI commands in 7 domains
  • 50+ task services
  • 5 cloud providers
  • 9 cluster templates
  • Pluggable provider interface

Impact: Add features without modifying core system.


6. Hybrid Rust + Nushell

Principle: Rust for performance-critical components, Nushell for orchestration.

Why:

  • Rust: Type safety, zero-cost abstractions, performance
  • Nushell: Structured data, productivity, easy automation
  • Hybrid: Best of both worlds

Application:

  • Core CLI: Bash wrapper → Nushell dispatcher
  • Orchestrator: Rust scheduler + Nushell task execution
  • Libraries: Nushell for business logic
  • Performance: Rust plugins for 10-50x speedup

Impact: Fast, type-safe, productive infrastructure automation.


7. State Management via Graph Database

Principle: Infrastructure relationships tracked via SurrealDB graph.

Why:

  • Model complex infrastructure relationships
  • Query relationships efficiently
  • Track dependencies
  • Support rollback via state history
  • Audit trail

Application:

  • SurrealDB for relationship queries
  • File-based persistence for queue
  • Event-driven state updates
  • Checkpoint-based recovery

Example Relationships:

Server → Network (connected to)
Server → Storage (mounts)
Cluster → Service (runs)
Workflow → Dependency (depends on)

Impact: Complex infrastructure relationships handled gracefully.


8. Security-First Design

Principle: Security is built-in, not bolted-on.

Why:

  • Enterprise compliance
  • Data protection
  • Access control
  • Audit trails
  • Threat detection

Application:

  • 4-layer security model (auth, authz, encryption, audit)
  • JWT authentication
  • Cedar policy enforcement
  • AES-256-GCM encryption
  • 7-year audit retention
  • MFA support (TOTP, WebAuthn)

Impact: Enterprise-grade security by default.


9. Progressive Disclosure

Principle: Simple for common cases, powerful for advanced use cases.

Why:

  • Low barrier to entry
  • Professional productivity
  • Advanced features available
  • Avoid overwhelming users
  • Gradual learning curve

Application:

  • Simple: Interactive TUI installer
  • Productive: CLI with 80+ shortcuts
  • Powerful: Batch workflows, policies
  • Advanced: Custom extensions, hooks

Impact: All skill levels supported.


10. Fail-Fast, Recover Gracefully

Principle: Detect issues early, provide recovery mechanisms.

Why:

  • Prevent invalid deployments
  • Enable safe recovery
  • Minimize blast radius
  • Audit failures for learning

Application:

  • Validation before execution
  • Checkpoint-based recovery
  • Automatic rollback on failure
  • Detailed error messages
  • Retry with exponential backoff

Code Example:

# Validate before deployment
provisioning validate config --strict

# Dry-run to check impact
provisioning --check server create

# Safe rollback on failure
provisioning workflow rollback --to-checkpoint

Impact: Safe infrastructure changes with confidence.


11. Observable & Auditable

Principle: All operations traceable, all changes auditable.

Why:

  • Compliance & regulation
  • Troubleshooting
  • Security investigation
  • Team accountability
  • Historical analysis

Application:

  • Comprehensive audit logging
  • 5 export formats (JSON, YAML, CSV, syslog, CloudWatch)
  • Structured log entries
  • Operation tracing
  • Resource change tracking

Impact: Complete visibility into infrastructure changes.


12. No Shortcuts on Reliability

Principle: Reliability features are standard, not optional.

Why:

  • Production requirements
  • Minimize downtime
  • Data protection
  • Business continuity
  • Trust & confidence

Application:

  • Checkpoint recovery
  • Automatic rollback
  • Health monitoring
  • Backup & restore
  • Multi-node deployment
  • Service redundancy

Impact: Enterprise-grade reliability standard.


Architectural Decision Records (ADRs)

Key decisions documenting rationale:

ADRDecisionRationale
ADR-011Nickel MigrationType-safety over KCL flexibility
ADR-010Config Strategy5-layer hierarchy over flat config
ADR-009SurrealDBGraph relationships over relational
ADR-008Modular CLI80+ shortcuts over verbose commands
ADR-007Workspace-FirstIsolation over global state
ADR-006Hybrid ArchitectureRust + Nushell for best of both

Design Trade-offs

DecisionGainCost
Type-SafetyFewer errorsLearning curve
Config HierarchyFlexibilityComplexity
Workspace IsolationSafetyDuplication
Modular CLIDiscoverabilityNo single command
SurrealDBRelationshipsResource overhead
Validation StrictSafetyFast iteration friction

Component Architecture

Detailed architecture of each major Provisioning component.

Core Components Map

User Interface
  ├─ CLI (Nushell dispatcher)
  ├─ Web Dashboard (Control Center UI)
  ├─ REST API (Control Center)
  └─ MCP Server (AI Integration)
       ↓
Core Engine (54 Nushell libraries)
  ├─ Configuration Management
  ├─ Provider Abstraction
  ├─ Workspace Management
  ├─ Infrastructure Validation
  ├─ Secrets Management
  └─ Command Utilities
       ↓
Platform Services (12 Rust microservices)
  ├─ Orchestrator (Workflow execution)
  ├─ Control Center (API + Auth)
  ├─ Control Center UI (Web dashboard)
  ├─ MCP Server (AI integration)
  ├─ Vault Service (Secrets backend)
  ├─ Extension Registry (OCI distribution)
  ├─ AI Service (LLM features)
  ├─ Detector (Anomaly detection)
  ├─ RAG (Knowledge retrieval)
  ├─ Provisioning Daemon (Background service)
  ├─ Platform Config (Configuration management)
  └─ Service Clients (API clients)
       ↓
Extensions (Modular infrastructure)
  ├─ Providers (5 cloud providers)
  ├─ Task Services (50+ services)
  ├─ Clusters (9 templates)
  └─ Workflows (Automation)
       ↓
Infrastructure (Running resources)
  ├─ Cloud Compute
  ├─ Networks & Storage
  ├─ Services
  └─ Monitoring

1. CLI Layer

Location: provisioning/core/cli/

Main Entry Point (provisioning)

Bash wrapper that:

  1. Detects Nushell installation
  2. Loads environment variables
  3. Validates workspace requirement
  4. Routes command to dispatcher
  5. Handles error reporting

Command Dispatcher

Location: provisioning/core/nulib/main_provisioning/dispatcher.nu

Supports:

  • 111+ commands across 7 domains
  • 80+ shortcuts for productivity
  • Bi-directional help (help workspace / workspace help)
  • Dynamic loading of command modules

2. Core Engine Components

Configuration Management

Location: provisioning/core/nulib/lib_provisioning/config/

Key Features:

  • Load merged configuration from 5 layers
  • 476+ accessors for config values
  • Variable interpolation & TOML merging
  • Schema validation
  • Configuration caching

Provider Abstraction

Location: provisioning/core/nulib/lib_provisioning/providers/

Supported Providers (5):

  • UpCloud - Primary European cloud
  • AWS - Amazon Web Services
  • Hetzner - Baremetal & cloud
  • Local - Development environment
  • Demo - Testing & mocking

Features:

  • Unified cloud provider interface
  • Dynamic provider loading
  • Credential management
  • Provider state tracking

Workspace Management

Location: provisioning/core/nulib/lib_provisioning/workspace/

Responsibilities:

  • Workspace registry tracking
  • Atomic workspace switching
  • Configuration isolation
  • Extension inheritance
  • State management

Workspace Registry:

workspaces:
  active: "my-project"
  registry:
    my-project:
      path: ~/.provisioning/workspaces/workspace_my_project
      created: 2026-01-16T10:30:00Z
      template: default

Infrastructure Validation

Location: provisioning/core/nulib/lib_provisioning/infra_validator/

Validation Stages:

  1. Syntax check - Valid Nickel syntax
  2. Type check - Type correctness
  3. Schema check - Matches expected schema
  4. Constraint check - Business rule validation
  5. Dependency check - Infrastructure dependencies
  6. Security check - Security policies

Secrets Management

Location: provisioning/core/nulib/lib_provisioning/secrets/

Backends:

  • SOPS + Age (default)
  • Cosmian KMS (enterprise)
  • AWS KMS (AWS)
  • Local KMS (development)

3. Platform Services

Orchestrator

Location: provisioning/platform/crates/orchestrator/

Technology: Rust + Nushell

Key Features:

  • High-performance workflow execution
  • File-based persistence
  • Checkpoint recovery
  • Parallel execution with dependencies
  • REST API (83+ endpoints)
  • Priority-based task scheduling

State Persistence:

~/.provisioning/
├── queue/           # Task queue
├── checkpoints/     # Workflow checkpoints
└── state/           # Infrastructure state

Control Center

Location: provisioning/platform/crates/control-center/

Technology: Rust (Axum)

Features:

  • JWT authentication
  • Cedar policy authorization
  • RBAC system
  • Audit logging
  • REST API for all operations

Authorization Model:

  • User roles (admin, user, viewer)
  • Fine-grained permissions
  • Cedar policy enforcement
  • Attribute-based access control

Control Center UI

Location: provisioning/platform/crates/control-center-ui/

Features:

  • Real-time infrastructure view
  • Workflow visualization
  • Configuration management
  • Resource monitoring
  • Audit log viewer

MCP Server

Location: provisioning/platform/crates/mcp-server/

Technology: Rust

Features:

  • AI-powered assistance via MCP
  • Natural language command parsing
  • Auto-completion of configurations
  • 7 configuration tools for LLM
  • Context-aware recommendations

Vault Service

Location: provisioning/platform/crates/vault-service/

Features:

  • Encrypted credential storage
  • KMS integration (5 backends)
  • SOPS + Age encryption
  • Secure credential injection
  • Audit logging for secret access

Extension Registry

Location: provisioning/platform/crates/extension-registry/

Features:

  • OCI-compliant distribution
  • Provider/taskserv packaging
  • Semantic version management
  • Content addressable storage
  • Registry API endpoints

AI Service

Location: provisioning/platform/crates/ai-service/

Features:

  • LLM integration platform
  • Infrastructure request parsing
  • Workspace context enrichment
  • Configuration suggestion generation
  • Multi-provider LLM support

Detector

Location: provisioning/platform/crates/detector/

Features:

  • System health monitoring
  • Anomaly pattern detection
  • Infrastructure issue identification
  • Real-time surveillance
  • Alerting system integration

RAG Service

Location: provisioning/platform/crates/rag/

Features:

  • Retrieval Augmented Generation
  • Document semantic embedding
  • Knowledge base integration
  • Context-aware answer generation
  • Multi-source knowledge synthesis

Provisioning Daemon

Location: provisioning/platform/crates/provisioning-daemon/

Features:

  • Background service operation
  • System event monitoring
  • Background job execution
  • Infrastructure state synchronization
  • Event-driven architecture

Platform Config

Location: provisioning/platform/crates/platform-config/

Features:

  • Centralized configuration loading
  • Schema-based validation
  • Multi-environment support
  • System-wide default settings
  • Configuration hot-reload support

Service Clients

Location: provisioning/platform/crates/service-clients/

Features:

  • Platform service client SDKs
  • Cloud provider API clients
  • HTTP/RPC request handling
  • Connection pooling and management
  • Retry logic and error handling

4. Extension Components

Providers

Location: provisioning/extensions/providers/

Structure:

providers/
├── upcloud/        # UpCloud provider
├── aws/            # AWS provider
├── hetzner/        # Hetzner provider
├── local/          # Local dev provider
├── demo/           # Demo/test provider
└── prov_lib/       # Shared utilities

Provider Interface:

  • Create/delete resources
  • List resources
  • Query resource status
  • Network/storage management
  • Credential validation

Task Services

Location: provisioning/extensions/taskservs/

50+ Services in 18 categories:

  • Container runtimes (containerd, podman, crio)
  • Kubernetes (etcd, coredns, cilium, calico)
  • Storage (rook-ceph, mayastor, nfs)
  • Databases (postgres, redis, mongodb)
  • Networking (ip-aliases, proxy, kms)
  • Security (webhook, kms, oras)
  • Observability (prometheus, grafana, loki)
  • Development (gitea, coder, buildkit)
  • Hypervisor (kvm, qemu, libvirt)

Clusters

Location: provisioning/extensions/clusters/

9 Pre-built Templates:

  • web - Web service cluster
  • oci-reg - Container registry
  • git - Git hosting (Gitea)
  • buildkit - Build infrastructure
  • k8s-ha - HA Kubernetes
  • postgresql - HA PostgreSQL
  • cicd-argocd - GitOps CI/CD
  • cicd-tekton - Tekton pipelines

5. Configuration Layer

Nickel Schemas

Location: provisioning/schemas/

Structure (27 directories):

schemas/
├── main.ncl             # Entry point
├── lib/                 # Utilities
├── config/              # Settings
├── infrastructure/      # Servers, networks
├── operations/          # Workflows
├── deployment/          # Kubernetes
├── services/            # Service defs
└── versions.ncl         # Tool versions

3-File Pattern:

  1. contracts.ncl - Type definitions
  2. defaults.ncl - Default values
  3. main.ncl - Entry point + makers

Component Dependencies

CLI
  ├─ Configuration
  ├─ Workspace
  ├─ Validation
  ├─ Secrets
  └─ Providers

Providers
  └─ Orchestrator

Orchestrator
  ├─ Task Services
  ├─ Control Center
  └─ State Manager

Control Center
  ├─ Authorization
  ├─ Audit Logging
  └─ State Manager

Communication Patterns

Synchronous (Request-Response)

CLI → Orchestrator → Provider → Cloud API

Asynchronous (Queue)

CLI → Orchestrator (queue) → [Background execution]

Event-Driven

Provider Event → Orchestrator → State Update
                            → Control Center
                            → Monitoring

Integration Patterns

Design patterns for extending and integrating with Provisioning.

1. Provider Integration Pattern

Pattern: Add a new cloud provider to Provisioning.

2. Task Service Integration Pattern

Pattern: Add infrastructure component.

3. Cluster Template Pattern

Pattern: Create pre-configured cluster template.

4. Batch Workflow Pattern

Pattern: Create automation workflow for complex operations.

5. Custom Extension Pattern

Pattern: Create custom Nushell library.

6. Authorization Policy Pattern

Pattern: Define fine-grained access control via Cedar.

7. Webhook Integration

Pattern: Trigger Provisioning from external systems.

8. Monitoring Integration

Pattern: Export metrics and logs to monitoring systems.

9. CI/CD Integration

Pattern: Use Provisioning in automated pipelines.

10. MCP Tool Integration

Pattern: Add AI-powered tool via MCP.

Integration Scenarios

Multi-Cloud Deployment

Deploy across UpCloud, AWS, and Hetzner in single workflow.

GitOps Workflow

Git changes trigger infrastructure updates via webhooks.

Self-Service Deployment

Non-technical users request infrastructure via natural language.

Best Practices

  1. Use type-safe Nickel schemas
  2. Implement proper error handling
  3. Log all operations for audit trails
  4. Test extensions before production
  5. Document configuration & usage
  6. Version extensions independently
  7. Support backward compatibility
  8. Validate inputs & encrypt credentials

Architecture Decision Records

This section contains Architecture Decision Records (ADRs) documenting key architectural decisions and their rationale for the Provisioning platform.

ADR Index

Core Architecture Decisions

Security and Cryptography

Operations and Observability

Decision Format

Each ADR follows this structure:

  • Status: Accepted, Proposed, Deprecated, Superseded
  • Context: Problem statement and constraints
  • Decision: The chosen approach
  • Consequences: Benefits and trade-offs
  • Alternatives: Other options considered
  • References: Related ADRs and external docs

Rationale for ADRs

ADRs document the “why” behind architectural choices:

  1. Modular CLI - Scales command set without monolithic registration
  2. Workspace-First - Isolates infrastructure and supports multi-tenancy
  3. Nickel Source of Truth - Ensures type-safe configuration and prevents runtime errors
  4. Microservice Distribution - Enables independent scaling and deployment
  5. Communication Protocol - Balances synchronous needs with async event processing
  6. Post-Quantum Crypto - Protects against future quantum computing threats
  7. Multi-Layer Encryption - Defense in depth against data breaches
  8. Observability - Enables rapid troubleshooting and performance analysis
  9. SLO Management - Aligns infrastructure quality with business objectives
  10. Incident Automation - Reduces MTTR and improves system resilience

Cross-References

These ADRs interact with:

  • Platform Documentation - See provisioning/docs/src/architecture/
  • Features - See provisioning/docs/src/features/ for implementation details
  • Development Guides - See provisioning/docs/src/development/ for extending systems
  • Security Documentation - See provisioning/docs/src/security/ for compliance details
  • Operations Guides - See provisioning/docs/src/operations/ for deployment procedures

Examples

Real-world infrastructure as code examples demonstrating Provisioning across multi-cloud, Kubernetes, security, and operational scenarios.

Overview

This section contains production-ready examples showing how to:

  • Deploy infrastructure from basic single-cloud to complex multi-cloud environments
  • Orchestrate Kubernetes clusters with Provisioning automation
  • Implement security patterns including encryption, secrets management, and compliance
  • Build custom workflows for specialized infrastructure operations
  • Handle disaster recovery with backup strategies and failover procedures
  • Optimize costs through resource analysis and right-sizing
  • Migrate legacy systems from traditional infrastructure to cloud-native architectures
  • Test infrastructure as code with validation, policy checks, and integration tests

All examples use Nickel for type-safe configuration and are designed as learning resources and templates for your own deployments.

Quick Start Examples

Basic Infrastructure Setup

  • Basic Setup - Single-cloud with networking, compute, storage - perfect starting point

  • E-Commerce Platform - Multi-tier application across AWS and UpCloud with load balancing, databases

Multi-Cloud Deployments

Operational Examples

Advanced Patterns

Security and Compliance

Cloud Provider Specific

Configuration and Migration

Example Organization

Each example follows this structure:

example-name.md
├── Overview - What this example demonstrates
├── Prerequisites - Required setup
├── Architecture Diagram - Visual representation
├── Nickel Configuration - Complete, runnable configuration
├── Deployment Steps - Command-by-command instructions
├── Verification - How to validate deployment
├── Troubleshooting - Common issues and solutions
└── Next Steps - How to extend or customize

Learning Paths

I’m new to Provisioning

  1. Start with Basic Setup
  2. Read Real-World Scenario
  3. Try Kubernetes Deployment

I need multi-cloud infrastructure

  1. Review Multi-Cloud Deployment
  2. Study Hybrid Cloud Setup
  3. Implement Advanced Networking

I need to migrate existing infrastructure

  1. Start with Legacy System Migration
  2. Add Terraform Migration if applicable
  3. Set up GitOps Deployment

I need enterprise features

  1. Implement Compliance and Audit
  2. Set up Disaster Recovery
  3. Deploy Cost Governance
  4. Configure Secrets Rotation

Copy and Customize

All examples are self-contained and can be:

  1. Copied into your workspace and adapted
  2. Extended with additional resources and customizations
  3. Tested using Provisioning’s validation framework
  4. Deployed directly via provisioning apply

Use them as templates, learning resources, or reference implementations for your own infrastructure.

  • Configuration Guide → See provisioning/docs/src/infrastructure/nickel-guide.md
  • API Reference → See provisioning/docs/src/api-reference/
  • Development → See provisioning/docs/src/development/
  • Operations → See provisioning/docs/src/operations/

Basic Setup

Simple infrastructure setup examples for getting started with the Provisioning platform.

Single Server Deployment

Deploy a simple web server with UpCloud:

# workspace/infra/web-server.ncl
{
  servers = [
    {
      name = "web-01",
      provider = 'upcloud,
      plan = 'medium,
      zone = "fi-hel1",
      storage = [
        {size_gb = 50, type = 'ssd}
      ]
    }
  ]
}

Deploy:

provisioning workspace create basic-web
cd basic-web
cp ../examples/web-server.ncl infra/

provisioning deploy --workspace basic-web --yes

Three-Tier Application

Web frontend, application backend, database:

{
  servers = [
    {name = "web-01", provider = 'upcloud, plan = 'small, zone = "fi-hel1"},
    {name = "app-01", provider = 'upcloud, plan = 'medium, zone = "fi-hel1"},
    {name = "db-01", provider = 'upcloud, plan = 'large, zone = "fi-hel1",
     storage = [{size_gb = 100, type = 'ssd}]},
  ],

  task_services = [
    {name = "nginx", target = "web-01"},
    {name = "nodejs", target = "app-01"},
    {name = "postgresql", target = "db-01"},
  ]
}

Development Environment

Local development stack with Docker:

{
  servers = [
    {name = "dev-local", provider = 'local, plan = 'medium}
  ],

  task_services = [
    {name = "docker"},
    {name = "postgresql"},
    {name = "redis"},
  ]
}

References

Multi-Cloud Examples

Deploy infrastructure across multiple cloud providers for redundancy and geographic distribution.

Primary-Backup Configuration

UpCloud primary in Europe, AWS backup in US:

{
  servers = [
    # Primary (UpCloud EU)
    {name = "web-eu", provider = 'upcloud, zone = "fi-hel1", plan = 'medium},
    {name = "db-eu", provider = 'upcloud, zone = "fi-hel1", plan = 'large},

    # Backup (AWS US)
    {name = "web-us", provider = 'aws, zone = "us-east-1a", plan = 't3.medium},
    {name = "db-us", provider = 'aws, zone = "us-east-1a", plan = 'm5.large},
  ],

  replication = {
    enabled = true,
    pairs = [
      {primary = "db-eu", standby = "db-us", mode = 'async}
    ]
  }
}

Geographic Distribution

Deploy to multiple regions for low latency:

{
  servers = [
    {name = "web-eu", provider = 'upcloud, zone = "fi-hel1"},
    {name = "web-us", provider = 'aws, zone = "us-west-2a"},
    {name = "web-asia", provider = 'aws, zone = "ap-southeast-1a"},
  ],

  load_balancing = {
    global = true,
    geo_routing = true
  }
}

References

Kubernetes Deployment Examples

Deploy production-ready Kubernetes clusters with the Provisioning platform.

Basic Kubernetes Cluster

3-node cluster with Cilium CNI:

{
  task_services = [
    {
      name = "kubernetes",
      config = {
        control_plane = {nodes = 3, plan = 'medium},
        workers = [{name = "default", nodes = 3, plan = 'large}],
        networking = {
          cni = 'cilium,
          pod_cidr = "10.42.0.0/16",
          service_cidr = "10.43.0.0/16"
        }
      }
    }
  ]
}

Production Cluster with Storage

Kubernetes with Rook-Ceph storage:

{
  task_services = [
    {
      name = "kubernetes",
      config = {
        control_plane = {nodes = 3, plan = 'medium},
        workers = [
          {name = "general", nodes = 5, plan = 'large},
          {name = "storage", nodes = 3, plan = 'xlarge,
           storage = [{size_gb = 500, type = 'ssd}]}
        ],
        networking = {cni = 'cilium}
      }
    },
    {
      name = "rook-ceph",
      config = {
        storage_nodes = ["storage-0", "storage-1", "storage-2"],
        osd_per_device = 1
      }
    }
  ]
}

References

Custom Workflow Examples

Build complex deployment workflows with dependency management and parallel execution.

Multi-Stage Deployment

{
  workflows = [{
    name = "app-deployment",
    steps = [
      {name = "provision-infrastructure", type = 'provision},
      {name = "install-kubernetes", type = 'task, depends_on = ["provision-infrastructure"]},
      {name = "deploy-application", type = 'task, depends_on = ["install-kubernetes"]},
      {name = "configure-monitoring", type = 'task, depends_on = ["deploy-application"]}
    ]
  }]
}

Parallel Regional Deployment

{
  workflows = [{
    name = "global-rollout",
    steps = [
      {name = "deploy-eu", type = 'task},
      {name = "deploy-us", type = 'task},
      {name = "deploy-asia", type = 'task},
      {name = "configure-dns", type = 'configure,
       depends_on = ["deploy-eu", "deploy-us", "deploy-asia"]}
    ]
  }]
}

References

Security Configuration Examples

Security configuration examples for authentication, encryption, and secrets management.

Complete Security Configuration

{
  security = {
    authentication = {
      enabled = true,
      jwt_algorithm = "RS256",
      mfa_required = true
    },

    secrets = {
      backend = "secretumvault",
      url = " [https://vault.example.com",](https://vault.example.com",)
      auto_rotate = true,
      rotation_days = 90
    },

    encryption = {
      at_rest = true,
      algorithm = "AES-256-GCM",
      kms_backend = "secretumvault"
    },

    audit = {
      enabled = true,
      retention_days = 2555,
      export_format = "json"
    }
  }
}

SecretumVault Integration

# Configure SecretumVault
provisioning config set security.secrets.backend secretumvault
provisioning config set security.secrets.url  [http://localhost:8200](http://localhost:8200)

# Store secrets
provisioning vault put database/password --value="secret123"

# Retrieve secrets
provisioning vault get database/password

Encrypted Infrastructure Configuration

{
  providers.upcloud = {
    username = "admin",
    password = std.secret "UPCLOUD_PASSWORD"  # Encrypted
  },

  databases = [{
    name = "production-db",
    password = std.secret "DB_PASSWORD"  # Encrypted
  }]
}

References

Troubleshooting

Systematic problem-solving guides and debugging procedures for diagnosing and resolving issues with the Provisioning platform.

Overview

This section helps you:

  • Solve common issues - Database connection errors, authentication failures, deployment failures
  • Debug problems - Diagnostic tools, log analysis, tracing execution paths
  • Analyze logs - Log aggregation, filtering, searching, pattern recognition
  • Understand errors - Error message interpretation and root cause analysis
  • Get support - Knowledge base, community resources, professional support

Organized by problem type and component for quick navigation.

Troubleshooting Guides

Quick Problem Solving

  • Common Issues - Authentication failures, deployment errors, configuration, resource limits, network problems

  • Debug Guide - Debug logging, verbose output, trace execution, collect diagnostics, analyze stack traces

  • Logs Analysis - Find logs, search techniques, log patterns, interpreting errors, diagnostics

Component-Specific Troubleshooting

Each microservice and component has its own troubleshooting section:

  • Orchestrator Issues - Workflow failures, scheduling problems, state inconsistencies
  • Control Center Issues - API errors, permission problems, configuration issues
  • Vault Service Issues - Secret access failures, key rotation problems, authentication errors
  • Detector Issues - Analysis failures, false positives, configuration problems
  • Extension Registry Issues - Provider loading, dependency resolution, versioning conflicts

Infrastructure and Configuration

  • Configuration Problems - Nickel syntax errors, schema validation failures, type mismatches
  • Provider Issues - Authentication failures, API limits, resource creation failures
  • Task Service Failures - Service-specific errors, timeout issues, state management problems
  • Network Problems - Connectivity issues, DNS resolution, firewall rules, certificate problems

Problem Diagnosis Flowchart

Issue Occurs
    ↓
Is it an authentication issue? → See [Common Issues](./common-issues.md) - Authentication
    ↓ No
Is it a deployment failure? → See [Common Issues](./common-issues.md) - Deployment
    ↓ No
Is it a configuration error? → See [Debug Guide](./debug-guide.md) - Configuration
    ↓ No
Enable debug logging → See [Debug Guide](./debug-guide.md)
    ↓
Collect logs and traces → See [Logs Analysis](./logs-analysis.md)
    ↓
Analyze patterns → Identify root cause
    ↓
Apply fix or escalate

Quick Reference: Common Problems

ProblemSolutionGuide
“Authentication failed”Check credentials, enable MFACommon Issues
“Permission denied”Verify RBAC policies, check Cedar rulesCommon Issues
“Deployment failed”Check logs, verify resources, test connectivityDebug Guide
“Configuration invalid”Validate Nickel schema, check typesCommon Issues
“Provider unavailable”Check API keys, verify connectivityCommon Issues
“Resource creation failed”Check resource limits, verify accountDebug Guide
“Timeout”Increase timeouts, check performanceDebug Guide
“Database error”Check connections, verify schemaCommon Issues

Debugging Workflow

  1. Reproduce - Can you consistently reproduce the issue?
  2. Enable Debug Logging - Set RUST_LOG=debug and PROVISIONING_LOG_LEVEL=debug
  3. Collect Evidence - Logs, configuration, error messages, stack traces
  4. Analyze Patterns - Look for errors, warnings, unusual timing
  5. Identify Cause - Root cause analysis
  6. Test Fix - Verify the fix resolves the issue
  7. Prevent Recurrence - Update documentation, add tests

Enable Diagnostic Logging

# Set log level to debug
export RUST_LOG=debug
export PROVISIONING_LOG_LEVEL=debug

# Collect logs to file
provisioning config set logging.file /var/log/provisioning.log
provisioning config set logging.level debug

# Enable verbose output
provisioning --verbose <command>

# Run with tracing
RUST_BACKTRACE=1 provisioning <command>

Common Error Codes

CodeMeaningAction
401UnauthorizedCheck authentication credentials
403ForbiddenCheck authorization policies
404Not FoundVerify resource exists
409ConflictResolve state conflicts
422InvalidVerify configuration schema
500Internal ErrorCheck server logs
503Service UnavailableWait for service to recover

Escalation Paths

Community Support

  1. Check Common Issues
  2. Search community forums
  3. Ask on GitHub discussions

Professional Support

  1. Open a support ticket
  2. Provide: logs, configuration, reproduction steps
  3. Wait for response

Emergency Issues (Security, Data Loss)

  1. Contact security team immediately
  2. Provide all evidence
  3. Document timeline

Support Resources

  • Documentation → Complete guides in provisioning/docs/src/
  • GitHub Issues → Community issues and discussions
  • Slack Community → Real-time community support
  • Email Supportprofessional@provisioning.io
  • Chat Support → Available during business hours
  • Operations Guide → See provisioning/docs/src/operations/
  • Architecture → See provisioning/docs/src/architecture/
  • Features → See provisioning/docs/src/features/
  • Development → See provisioning/docs/src/development/
  • Examples → See provisioning/docs/src/examples/

Common Issues

Debug Guide

Logs Analysis

Getting Help

AI & Machine Learning

Provisioning includes comprehensive AI capabilities for infrastructure automation via natural language, intelligent configuration suggestions, and anomaly detection.

Overview

The AI system consists of three integrated components:

  1. TypeDialog AI Backends - Interactive form intelligence and agent automation
  2. AI Service Microservice - Central AI processing and coordination
  3. Core AI Libraries - Nushell query processing and LLM integration

Key Capabilities

Natural Language Infrastructure

Request infrastructure changes in plain English:

# Natural language request
provisioning ai "Create 3 web servers with load balancing and auto-scaling"

# Returns:
# - Parsed infrastructure requirements
# - Generated Nickel configuration
# - Deployment confirmation

Intelligent Configuration

AI suggests optimal configurations based on context:

  • Database selection and tuning
  • Network topology recommendations
  • Security policy generation
  • Resource allocation optimization

Anomaly Detection

Continuous monitoring and intelligent alerting:

  • Infrastructure health anomalies
  • Performance pattern detection
  • Security issue identification
  • Predictive alerting

Components at a Glance

ComponentPurposeTechnology
typedialog-aiForm intelligence & suggestionsHTTP server, SurrealDB
typedialog-agAI agents & workflow automationType-safe agents, Nickel transpilation
ai-serviceCentral AI microserviceRust, LLM integration
ragKnowledge base retrievalSemantic search, embeddings
mcp-serverModel Context ProtocolAI tool interface
detectorAnomaly detection systemPattern recognition

Quick Start

Enable AI Features

# Install AI tools
provisioning install ai-tools

# Configure AI service
provisioning ai configure --provider openai --model gpt-4

# Test AI capabilities
provisioning ai test

Use Natural Language

# Simple request
provisioning ai "Create a Kubernetes cluster"

# Complex request with options
provisioning ai "Deploy PostgreSQL HA cluster with replication in AWS, backup to S3"

# Get help on AI features
provisioning help ai

Architecture

The AI system follows a layered architecture:

┌─────────────────────────────────┐
│  User Interface Layer            │
│  • Natural language input        │
│  • TypeDialog AI forms           │
│  • Chat interface                │
└────────────┬────────────────────┘
             ↓
┌─────────────────────────────────┐
│  AI Orchestration Layer          │
│  • AI Service (Rust)             │
│  • Query processing (Nushell)    │
│  • Intent recognition            │
└────────────┬────────────────────┘
             ↓
┌─────────────────────────────────┐
│  Knowledge & Processing Layer    │
│  • RAG (Retrieval)               │
│  • LLM Integration               │
│  • MCP Server                    │
│  • Detector (anomalies)          │
└────────────┬────────────────────┘
             ↓
┌─────────────────────────────────┐
│  Infrastructure Layer            │
│  • Nickel configuration          │
│  • Deployment execution          │
│  • Monitoring & feedback         │
└─────────────────────────────────┘

Topics

Configuration

Environment Variables

# LLM Provider
export PROVISIONING_AI_PROVIDER=openai        # openai, anthropic, local
export PROVISIONING_AI_MODEL=gpt-4            # Model identifier
export PROVISIONING_AI_API_KEY=sk-...         # API key

# AI Service
export PROVISIONING_AI_SERVICE_PORT=9091      # AI service port
export PROVISIONING_AI_ENABLE_ANOMALY=true    # Enable detector
export PROVISIONING_AI_RAG_THRESHOLD=0.75     # Similarity threshold

Configuration File

# ~/.config/provisioning/ai.yaml
ai:
  enabled: true
  provider: openai
  model: gpt-4
  api_key: ${PROVISIONING_AI_API_KEY}

  service:
    port: 9091
    timeout: 30
    max_retries: 3

  typedialog:
    ai_enabled: true
    ag_enabled: true
    suggestions: true

  rag:
    enabled: true
    similarity_threshold: 0.75
    max_results: 5

  detector:
    enabled: true
    update_interval: 60
    alert_threshold: 0.8

Use Cases

1. Infrastructure from Description

Describe infrastructure in natural language, get Nickel configuration:

provisioning ai deploy "
  Create a production Kubernetes cluster with:
  - 3 control planes
  - 5 worker nodes
  - HA PostgreSQL (3 nodes)
  - Prometheus monitoring
  - Encrypted networking
"

2. Configuration Assistance

Get AI suggestions while filling out forms:

provisioning setup profile
# TypeDialog shows suggestions based on context
# Database recommendations based on workload
# Security settings optimized for environment

3. Troubleshooting

AI analyzes logs and suggests fixes:

provisioning ai troubleshoot --service orchestrator

# Output:
# Issue detected: High memory usage
# Likely cause: Task queue backlog
# Suggestion: Scale orchestrator replicas to 3
# Command: provisioning orchestrator scale --replicas 3

4. Anomaly Detection

Continuous monitoring with intelligent alerts:

provisioning ai anomalies --since 1h

# Output:
# ⚠️  Unusual pattern detected
# Time: 2026-01-16T01:47:00Z
# Service: control-center
# Metric: API response time
# Baseline: 45ms → Current: 320ms (+611%)
# Likelihood: Query performance regression

Limitations

  • LLM Dependency: Requires external LLM provider (OpenAI, Anthropic, etc.)
  • Network Required: Cloud-based LLM providers need internet connectivity
  • Context Window: Large infrastructures may exceed LLM context limits
  • Cost: API calls incur per-token charges
  • Latency: Natural language processing adds response latency (2-5 seconds)

Configuration Files

Key files for AI configuration:

FilePurpose
.typedialog/ai.dbAI SurrealDB database (typedialog-ai)
.typedialog/agent-*.yamlAI agent definitions (typedialog-ag)
~/.config/provisioning/ai.yamlUser AI settings
provisioning/core/versions.nclTypeDialog versions
core/nulib/lib_provisioning/ai/Core AI libraries
platform/crates/ai-service/AI service crate

Performance

Typical Latencies

OperationLatency
Simple request parsing100-200ms
LLM inference2-5 seconds
Configuration generation500ms-1s
Anomaly detection50-100ms

Scalability

  • Concurrent requests: 100+ (load balanced)
  • Query processing: 10,000+ queries/second
  • RAG similarity search: <50ms for 1M documents
  • Anomaly detection: Real-time on 1000+ metrics

Security

API Keys

  • Stored encrypted in vault-service
  • Never logged or persisted in plain text
  • Rotated automatically (configurable)
  • Audit trail for all API usage

Data Privacy

  • Natural language queries not stored by default
  • LLM provider agreements (OpenAI terms, etc.)
  • Local-only RAG option available
  • GDPR compliance support

AI Architecture

Complete system architecture of Provisioning’s AI capabilities, from user interface through infrastructure generation.

System Overview

┌──────────────────────────────────────────────────┐
│  User Interface Layer                            │
│  • CLI (natural language)                        │
│  • TypeDialog AI forms                           │
│  • Interactive wizards                           │
│  • Web dashboard                                 │
└────────────────────┬─────────────────────────────┘
                     ↓
┌──────────────────────────────────────────────────┐
│  Request Processing Layer                        │
│  • Intent recognition                            │
│  • Entity extraction                             │
│  • Context parsing                               │
│  • Request validation                            │
└────────────────────┬─────────────────────────────┘
                     ↓
┌──────────────────────────────────────────────────┐
│  Knowledge & Retrieval Layer (RAG)              │
│  • Document embedding                            │
│  • Vector similarity search                      │
│  • Keyword matching (BM25)                       │
│  • Hybrid ranking                                │
└────────────────────┬─────────────────────────────┘
                     ↓
┌──────────────────────────────────────────────────┐
│  LLM Integration Layer                           │
│  • MCP tool registration                         │
│  • Context augmentation                          │
│  • Prompt engineering                            │
│  • LLM API calls (OpenAI, Anthropic, etc.)      │
└────────────────────┬─────────────────────────────┘
                     ↓
┌──────────────────────────────────────────────────┐
│  Configuration Generation Layer                  │
│  • Nickel code generation                        │
│  • Schema validation                             │
│  • Constraint checking                           │
│  • Cost estimation                               │
└────────────────────┬─────────────────────────────┘
                     ↓
┌──────────────────────────────────────────────────┐
│  Execution & Feedback Layer                      │
│  • DAG planning                                  │
│  • Dry-run simulation                            │
│  • Deployment execution                          │
│  • Performance monitoring                        │
└──────────────────────────────────────────────────┘

Component Architecture

1. User Interface Layer

Entry Points:

Natural Language Input
    ├─ CLI: provisioning ai "create kubernetes cluster"
    ├─ Interactive: provisioning ai interactive
    ├─ Forms: TypeDialog AI-enhanced forms
    └─ Web Dashboard: /ai/infrastructure-builder

Processing:

  • Tokenization and normalization
  • Command pattern matching
  • Ambiguity resolution
  • Confidence scoring

2. Intent Recognition

User Request
    ↓
Intent Classification
    ├─ Create infrastructure (60%)
    ├─ Modify configuration (25%)
    ├─ Query knowledge (10%)
    └─ Troubleshoot issue (5%)
    ↓
Entity Extraction
    ├─ Resource type (server, database, cluster)
    ├─ Cloud provider (AWS, UpCloud, Hetzner)
    ├─ Count/Scale (3 nodes, 10GB)
    ├─ Requirements (HA, encrypted, monitoring)
    └─ Constraints (budget, region, environment)
    ↓
Request Structure

3. RAG Knowledge Retrieval

Embedding Process:

Query: "Create 3 web servers with load balancer"
    ↓
Embed Query → Vector [0.234, 0.567, 0.891, ...]
    ↓
Search Relevant Documents
    ├─ Vector similarity (semantic)
    ├─ BM25 keyword matching (syntactic)
    └─ Hybrid ranking
    ↓
Top Results:
    1. "Web Server HA Patterns" (0.94 similarity)
    2. "Load Balancing Best Practices" (0.87)
    3. "Auto-Scaling Configuration" (0.76)
    ↓
Extract Context & Augment Prompt

Knowledge Organization:

knowledge/
├── infrastructure/           (450 docs)
│   ├── kubernetes/
│   ├── databases/
│   ├── networking/
│   └── web-services/
├── best-practices/          (300 docs)
│   ├── high-availability/
│   ├── disaster-recovery/
│   └── performance/
├── providers/               (250 docs)
│   ├── aws/
│   ├── upcloud/
│   └── hetzner/
└── security/                (200 docs)
    ├── encryption/
    ├── authentication/
    └── compliance/

4. LLM Integration (MCP)

Tool Registration:

LLM (GPT-4, Claude 3)
    ↓
MCP Server (provisioning-mcp)
    ↓
Available Tools:
    ├─ create_infrastructure
    ├─ analyze_configuration
    ├─ generate_policies
    ├─ estimate_costs
    ├─ check_compatibility
    ├─ validate_nickel
    ├─ query_knowledge_base
    └─ get_recommendations
    ↓
Tool Execution

Prompt Engineering Pipeline:

Base Prompt Template
    ↓
Add Context (RAG results)
    ↓
Add Constraints
    ├─ Budget limit
    ├─ Region restrictions
    ├─ Compliance requirements
    └─ Performance targets
    ↓
Add Examples
    ├─ Successful deployments
    ├─ Error patterns
    └─ Best practices
    ↓
Enhanced Prompt
    ↓
LLM Inference

5. Configuration Generation

Nickel Code Generation:

LLM Output (structured)
    ↓
Nickel Template Filling
    ├─ Server definitions
    ├─ Network configuration
    ├─ Storage setup
    └─ Monitoring config
    ↓
Generated Nickel File
    ↓
Syntax Validation
    ↓
Schema Validation (Type Checking)
    ↓
Constraint Verification
    ├─ Resource limits
    ├─ Budget constraints
    ├─ Compliance policies
    └─ Provider capabilities
    ↓
Cost Estimation
    ↓
Final Configuration

6. Execution & Feedback

Deployment Planning:

Configuration
    ↓
DAG Generation (Directed Acyclic Graph)
    ├─ Task decomposition
    ├─ Dependency analysis
    ├─ Parallelization
    └─ Scheduling
    ↓
Dry-Run Simulation
    ├─ Check resources available
    ├─ Validate API access
    ├─ Estimate time
    └─ Identify risks
    ↓
Execution with Checkpoints
    ├─ Create resources
    ├─ Monitor progress
    ├─ Collect metrics
    └─ Save checkpoints
    ↓
Post-Deployment
    ├─ Verify functionality
    ├─ Run health checks
    ├─ Collect performance data
    └─ Store feedback for future improvements

Data Flow Examples

Example 1: Simple Request

User: "Create 3 web servers with load balancer"
    ↓
Intent: Create Infrastructure
Entities: type=server, count=3, load_balancer=true
    ↓
RAG Retrieval: "Web Server Patterns", "Load Balancing"
    ↓
LLM Prompt:
"Generate Nickel config for 3 web servers with load balancer.
Context: [web server best practices from knowledge base]
Constraints: High availability, auto-scaling enabled"
    ↓
Generated Nickel:
{
  servers = [
    {name = "web-01", cpu = 4, memory = 8},
    {name = "web-02", cpu = 4, memory = 8},
    {name = "web-03", cpu = 4, memory = 8}
  ]
  load_balancer = {
    type = "application"
    health_check = "/health"
  }
}
    ↓
Configuration Generated & Validated ✓
    ↓
User Approval
    ↓
Deployment

Example 2: Complex Multi-Cloud Request

User: "Deploy Kubernetes to AWS, UpCloud, and Hetzner with replication"
    ↓
Intent: Multi-Cloud Infrastructure
Entities: type=kubernetes, providers=[aws, upcloud, hetzner], replicas=3
    ↓
RAG Retrieval:
    - "Multi-Cloud Kubernetes Patterns"
    - "Inter-Region Replication"
    - "AWS Kubernetes Setup"
    - "UpCloud Kubernetes Setup"
    - "Hetzner Kubernetes Setup"
    ↓
LLM Processes:
    1. Analyze multi-cloud topology
    2. Identify networking requirements
    3. Plan data replication strategy
    4. Consider regional compliance
    ↓
Generated Nickel:
    - Infrastructure definitions for each provider
    - Inter-region networking configuration
    - Replication topology
    - Failover policies
    ↓
Cost Breakdown:
    AWS: $2,500/month
    UpCloud: $1,800/month
    Hetzner: $1,500/month
    Total: $5,800/month
    ↓
Compliance Check: EU GDPR ✓, US HIPAA ✓
    ↓
Ready for Deployment

Key Technologies

LLM Providers

Supported external LLM providers:

ProviderModelsLatencyCost
OpenAIGPT-4, GPT-3.52-3s$0.05-0.15/1K tokens
AnthropicClaude 3 Opus2-4s$0.03-0.015/1K tokens
Local (Ollama)Llama 2, Mistral5-10sFree

Vector Databases

  • SurrealDB (default): Embedded vector database with HNSW indexing
  • Pinecone: Cloud vector database (optional)
  • Milvus: Open-source vector database (optional)

Embedding Models

  • text-embedding-3-small (OpenAI): 1,536 dimensions
  • text-embedding-3-large (OpenAI): 3,072 dimensions
  • all-MiniLM-L6-v2 (local): 384 dimensions

Performance Characteristics

Latency Breakdown

For a typical infrastructure creation request:

StageLatencyDetails
Intent Recognition50-100msLocal NLP
RAG Retrieval50-100msVector search
LLM Inference2-5sExternal API
Nickel Generation100-200msTemplate filling
Validation200-500msType checking
Total2.5-6 secondsEnd-to-end

Concurrency

  • Concurrent Requests: 100+ (with load balancing)
  • RAG QPS: 50+ searches/second
  • LLM Throughput: 10+ concurrent requests per API key
  • Memory: 500MB-2GB (depends on cache size)

Security Architecture

Data Protection

User Input
    ↓
Input Sanitization
    ├─ Remove PII
    ├─ Validate constraints
    └─ Check permissions
    ↓
Processing (encrypted in transit)
    ├─ TLS 1.3 to LLM provider
    ├─ Secrets stored in vault-service
    └─ Credentials never logged
    ↓
Generated Configuration
    ├─ Encrypted at rest (AES-256)
    ├─ Signed for integrity
    └─ Audit trail maintained
    ↓
Output

Access Control

  • API key validation
  • RBAC permission checking
  • Rate limiting per user/key
  • Audit logging of all operations

Extensibility

Custom Tools

Register custom tools with MCP:

// Custom tool example
register_tool("custom-validator", | confi| g {
    validate_custom_requirements(&config)
});

Custom RAG Documents

Add domain-specific knowledge:

provisioning ai knowledge import \
  --source ./custom-docs \
  --category infrastructure

Fine-tuning (Future)

  • Support for fine-tuned LLM models
  • Custom prompt templates
  • Organization-specific knowledge bases

TypeDialog AI & AG Integration

TypeDialog provides two AI-powered tools for Provisioning: typedialog-ai (configuration assistant) and typedialog-ag (agent automation).

TypeDialog Components

typedialog-ai v0.1.0

AI Assistant - HTTP server backend for intelligent form suggestions and infrastructure recommendations.

Purpose: Enhance interactive forms with AI-powered suggestions and natural language parsing.

Architecture:

TypeDialog Form
    ↓
typedialog-ai HTTP Server
    ↓
SurrealDB Backend
    ↓
LLM Provider (OpenAI, Anthropic, etc.)
    ↓
Suggestions → Deployed Config

Key Features:

  • Form Intelligence: Context-aware field suggestions
  • Database Recommendations: Suggest database type/configuration based on workload
  • Network Optimization: Generate optimal network topology
  • Security Policies: AI-generated Cedar policies
  • Cost Estimation: Predict infrastructure costs

Installation:

# Via provisioning script
provisioning install ai-tools

# Manual installation
wget  [https://github.com/typedialog/typedialog-ai/releases/download/v0.1.0/typedialog-ai-<os>-<arch>](https://github.com/typedialog/typedialog-ai/releases/download/v0.1.0/typedialog-ai-<os>-<arch>)
chmod +x typedialog-ai
mv typedialog-ai ~/.local/bin/

Usage:

# Start AI server
typedialog ai serve --db-path ~/.typedialog/ai.db --port 9000

# Test connection
curl  [http://localhost:9000/health](http://localhost:9000/health)

# Get suggestion for database
curl -X POST  [http://localhost:9000/suggest/database](http://localhost:9000/suggest/database) \
  -H "Content-Type: application/json" \
  -d '{"workload": "transactional", "size": "1TB", "replicas": 3}'

# Response:
# {"suggestion": "PostgreSQL 15 with pgvector", "confidence": 0.92}

Configuration:

# ~/.typedialog/ai-config.yaml
typedialog-ai:
  port: 9000
  db_path: ~/.typedialog/ai.db
  loglevel: info

  llm:
    provider: openai              # or: anthropic, local
    model: gpt-4
    api_key: ${OPENAI_API_KEY}
    temperature: 0.7

  features:
    form_suggestions: true
    database_recommendations: true
    network_optimization: true
    security_policy_generation: true
    cost_estimation: true

  cache:
    enabled: true
    ttl: 3600

Database Schema:

-- SurrealDB schema for AI suggestions
DEFINE TABLE ai_suggestions SCHEMAFULL
DEFINE FIELD timestamp ON ai_suggestions TYPE datetime DEFAULT now();
DEFINE FIELD context ON ai_suggestions TYPE object;
DEFINE FIELD suggestion ON ai_suggestions TYPE string;
DEFINE FIELD confidence ON ai_suggestions TYPE float;
DEFINE FIELD accepted ON ai_suggestions TYPE bool;

DEFINE TABLE ai_models SCHEMAFULL
DEFINE FIELD name ON ai_models TYPE string;
DEFINE FIELD version ON ai_models TYPE string;
DEFINE FIELD provider ON ai_models TYPE string;

Endpoints:

EndpointMethodPurpose
/healthGETHealth check
/suggest/databasePOSTDatabase recommendations
/suggest/networkPOSTNetwork topology
/suggest/securityPOSTSecurity policies
/estimate/costPOSTCost estimation
/parse/natural-languagePOSTParse natural language
/feedbackPOSTStore suggestion feedback

typedialog-ag v0.1.0

AI Agents - Type-safe agents for automation workflows and Nickel transpilation.

Purpose: Define complex automation workflows using type-safe agent descriptions, then transpile to executable Nickel.

Architecture:

Agent Definition (.agent.yaml)
    ↓
typedialog-ag Type Checker
    ↓
Agent Execution Plan
    ↓
Nickel Transpilation
    ↓
Provisioning Execution

Key Features:

  • Type-Safe Agents: Strongly-typed agent definitions
  • Workflow Automation: Chain multiple infrastructure tasks
  • Nickel Transpilation: Generate Nickel IaC automatically
  • Agent Orchestration: Parallel and sequential execution
  • Rollback Support: Automatic rollback on failure

Installation:

# Via provisioning script
provisioning install ai-tools

# Manual installation
wget  [https://github.com/typedialog/typedialog-ag/releases/download/v0.1.0/typedialog-ag-<os>-<arch>](https://github.com/typedialog/typedialog-ag/releases/download/v0.1.0/typedialog-ag-<os>-<arch>)
chmod +x typedialog-ag
mv typedialog-ag ~/.local/bin/

Agent Definition Syntax:

# provisioning/workflows/deploy-k8s.agent.yaml
version: "1.0"
agent: deploy-k8s
description: "Deploy HA Kubernetes cluster with observability stack"

types:
  CloudProvider:
    enum: ["aws", "upcloud", "hetzner"]
  NodeConfig:
    cpu: int           # 2..64
    memory: int        # 4..256 (GB)
    disk: int          # 10..1000 (GB)

input:
  provider: CloudProvider
  name: string         # cluster name
  nodes: int           # 3..100
  node_config: NodeConfig
  enable_monitoring: bool = true
  enable_backup: bool = true

workflow:
  - name: validate
    task: validate_cluster_config
    args:
      provider: $input.provider
      nodes: $input.nodes
      node_config: $input.node_config

  - name: create_network
    task: create_vpc
    depends_on: [validate]
    args:
      provider: $input.provider
      cidr: "10.0.0.0/16"

  - name: create_nodes
    task: create_nodes
    depends_on: [create_network]
    parallel: true
    args:
      provider: $input.provider
      count: $input.nodes
      config: $input.node_config

  - name: install_kubernetes
    task: install_kubernetes
    depends_on: [create_nodes]
    args:
      nodes: $create_nodes.output.node_ids
      version: "1.28.0"

  - name: add_monitoring
    task: deploy_observability_stack
    depends_on: [install_kubernetes]
    when: $input.enable_monitoring
    args:
      cluster_name: $input.name
      storage_class: "ebs"

  - name: setup_backup
    task: configure_backup
    depends_on: [install_kubernetes]
    when: $input.enable_backup
    args:
      cluster_name: $input.name
      backup_interval: "daily"

output:
  cluster_name: string
  cluster_id: string
  kubeconfig_path: string
  monitoring_url: string

Usage:

# Type-check agent
typedialog ag check deploy-k8s.agent.yaml

# Run agent interactively
typedialog ag run deploy-k8s.agent.yaml \
  --provider upcloud \
  --name production-k8s \
  --nodes 5 \
  --node-config '{"cpu": 8, "memory": 32, "disk": 100}'

# Transpile to Nickel
typedialog ag transpile deploy-k8s.agent.yaml > deploy-k8s.ncl

# Execute generated Nickel
provisioning apply deploy-k8s.ncl

Generated Nickel Output (example):

{
  metadata = {
    agent = "deploy-k8s"
    version = "1.0"
    generated_at = "2026-01-16T01:47:00Z"
  }

  resources = {
    network = {
      provider = "upcloud"
      vpc = { cidr = "10.0.0.0/16" }
    }

    compute = {
      provider = "upcloud"
      nodes = [
        { count = 5, cpu = 8, memory = 32, disk = 100 }
      ]
    }

    kubernetes = {
      version = "1.28.0"
      high_availability = true
      monitoring = {
        enabled = true
        stack = "prometheus-grafana"
      }
      backup = {
        enabled = true
        interval = "daily"
      }
    }
  }
}

Agent Features:

FeaturePurpose
DependenciesDeclare task ordering (depends_on)
ParallelismRun independent tasks in parallel
ConditionalsExecute tasks based on input conditions
Type SafetyStrong typing on inputs and outputs
RollbackAutomatic rollback on failure
LoggingFull execution trace for debugging

Integration with Provisioning

Using typedialog-ai in Forms

# .typedialog/provisioning/form.toml
[[elements]]
name = "database_type"
prompt = "form-database_type-prompt"
type = "select"
options = ["postgres", "mysql", "mongodb"]

# Enable AI suggestions
[elements.ai_suggestions]
enabled = true
context = "workload"
provider = "typedialog-ai"
endpoint = " [http://localhost:9000/suggest/database"](http://localhost:9000/suggest/database")

Using typedialog-ag in Workflows

# Define agent-based workflow
provisioning workflow define \
  --agent deploy-k8s.agent.yaml \
  --name k8s-deployment \
  --auto-execute

# Run workflow
provisioning workflow run k8s-deployment \
  --provider upcloud \
  --nodes 5

Performance

typedialog-ai

  • Suggestion latency: 500ms-2s per suggestion
  • Database queries: <100ms (cached)
  • Concurrent users: 50+
  • SurrealDB storage: <1GB for 10K suggestions

typedialog-ag

  • Type checking: <100ms per agent
  • Transpilation: <500ms to Nickel
  • Parallel task execution: O(1) overhead
  • Agent memory: <50MB per agent

Configuration

Enable AI in Provisioning

# provisioning/config/config.defaults.toml
[ai]
enabled = true
typedialog_ai = true
typedialog_ag = true

[ai.typedialog]
ai_server_url = " [http://localhost:9000"](http://localhost:9000")
ag_executable = "typedialog-ag"

[ai.form_suggestions]
enabled = true
providers = ["database", "network", "security"]
confidence_threshold = 0.75

AI Service Crate

The AI Service crate (provisioning/platform/crates/ai-service/) is the central AI processing microservice for Provisioning. It coordinates LLM integration, knowledge retrieval, and infrastructure recommendation generation.

Architecture

Core Modules

The AI Service is organized into specialized modules:

ModulePurpose
config.rsConfiguration management and AI service settings
service.rsMain service logic and request handling
mcp.rsModel Context Protocol integration for LLM tools
knowledge.rsKnowledge base management and retrieval
dag.rsDirected Acyclic Graph for workflow orchestration
handlers.rsHTTP endpoint handlers
tool_integration.rsTool registration and execution

Request Flow

User Request (natural language)
    ↓
Handlers (HTTP endpoint)
    ↓
Intent Recognition (config.rs)
    ↓
Knowledge Retrieval (knowledge.rs)
    ↓
MCP Tool Selection (mcp.rs)
    ↓
LLM Processing (external provider)
    ↓
DAG Execution Planning (dag.rs)
    ↓
Infrastructure Generation
    ↓
Response to User

Configuration

Environment Variables

# LLM Configuration
export PROVISIONING_AI_PROVIDER=openai
export PROVISIONING_AI_MODEL=gpt-4
export PROVISIONING_AI_API_KEY=sk-...

# Service Configuration
export PROVISIONING_AI_PORT=9091
export PROVISIONING_AI_LOG_LEVEL=info
export PROVISIONING_AI_TIMEOUT=30

# Knowledge Base
export PROVISIONING_AI_KNOWLEDGE_PATH=~/.provisioning/knowledge
export PROVISIONING_AI_CACHE_TTL=3600

# RAG Configuration
export PROVISIONING_AI_RAG_ENABLED=true
export PROVISIONING_AI_RAG_SIMILARITY_THRESHOLD=0.75

Configuration File

# provisioning/config/ai-service.toml
[ai_service]
port = 9091
timeout = 30
max_concurrent_requests = 100

[llm]
provider = "openai"                 # openai, anthropic, local
model = "gpt-4"
api_key = "${PROVISIONING_AI_API_KEY}"
temperature = 0.7
max_tokens = 2000

[knowledge]
enabled = true
path = "~/.provisioning/knowledge"
cache_ttl = 3600
update_interval = 3600

[rag]
enabled = true
similarity_threshold = 0.75
max_results = 5
embedding_model = "text-embedding-3-small"

[dag]
max_parallel_tasks = 10
timeout_per_task = 60
enable_rollback = true

[security]
validate_inputs = true
rate_limit = 1000                   # requests/minute
audit_logging = true

HTTP API

Endpoints

Create Infrastructure Request

POST /v1/infrastructure/create
Content-Type: application/json

{
  "request": "Create 3 web servers with load balancing",
  "context": {
    "workspace": "production",
    "provider": "upcloud",
    "environment": "prod"
  },
  "options": {
    "auto_apply": false,
    "return_nickel": true,
    "validate": true
  }
}

Response:

{
  "request_id": "req-12345",
  "status": "success",
  "infrastructure": {
    "servers": [
      {"name": "web-01", "cpu": 4, "memory": 8},
      {"name": "web-02", "cpu": 4, "memory": 8},
      {"name": "web-03", "cpu": 4, "memory": 8}
    ],
    "load_balancer": {"name": "lb-01", "type": "round-robin"}
  },
  "nickel_config": "{ servers = [...] }",
  "confidence": 0.92,
  "notes": ["All servers in same availability zone", "Load balancer configured for health checks"]
}

Analyze Configuration

POST /v1/configuration/analyze
Content-Type: application/json

{
  "configuration": "{ name = \"server-01\", cpu = 2, memory = 4 }",
  "context": {"provider": "upcloud", "environment": "prod"}
}

Response:

{
  "analysis": {
    "resources": {
      "cpu_score": "low",
      "memory_score": "minimal",
      "recommendation": "Increase to cpu=4, memory=8 for production"
    },
    "security": {
      "findings": ["No backup configured", "No monitoring"],
      "recommendations": ["Enable automated backups", "Deploy monitoring agent"]
    },
    "cost": {
      "estimated_monthly": "$45",
      "optimization_potential": "20% cost reduction possible"
    }
  }
}

Generate Policies

POST /v1/policies/generate
Content-Type: application/json

{
  "requirements": "Allow developers to create servers but not delete, admins full access",
  "format": "cedar"
}

Response:

{
  "policies": [
    {
      "effect": "permit",
      "principal": {"role": "developer"},
      "action": "CreateServer",
      "resource": "Server::*"
    },
    {
      "effect": "permit",
      "principal": {"role": "admin"},
      "action": ["CreateServer", "DeleteServer", "ModifyServer"],
      "resource": "Server::*"
    }
  ],
  "format": "cedar",
  "validation": "valid"
}

Get Suggestions

GET /v1/suggestions?context=database&workload=transactional&scale=large

Response:

{
  "suggestions": [
    {
      "type": "database",
      "recommendation": "PostgreSQL 15 with pgvector",
      "rationale": "Optimal for transactional workload with vector support",
      "confidence": 0.95,
      "config": {
        "engine": "postgres",
        "version": "15",
        "extensions": ["pgvector"],
        "replicas": 3,
        "backup": "daily"
      }
    }
  ]
}

Get Health Status

GET /v1/health

Response:

{
  "status": "healthy",
  "version": "0.1.0",
  "llm": {
    "provider": "openai",
    "model": "gpt-4",
    "available": true
  },
  "knowledge": {
    "documents": 1250,
    "last_update": "2026-01-16T01:00:00Z"
  },
  "rag": {
    "enabled": true,
    "embeddings": 1250,
    "search_latency_ms": 45
  },
  "uptime_seconds": 86400
}

MCP Tool Integration

Available Tools

The AI Service registers tools with the MCP server for LLM access:

// Tools available to LLM
tools = [
  "create_infrastructure",
  "analyze_configuration",
  "generate_policies",
  "get_recommendations",
  "query_knowledge_base",
  "estimate_costs",
  "check_compatibility",
  "validate_nickel"
]

Tool Definitions

{
  "name": "create_infrastructure",
  "description": "Create infrastructure from natural language description",
  "parameters": {
    "type": "object",
    "properties": {
      "request": {"type": "string"},
      "provider": {"type": "string"},
      "context": {"type": "object"}
    },
    "required": ["request"]
  }
}

Knowledge Base

Structure

knowledge/
├── infrastructure/         # Infrastructure patterns
│   ├── kubernetes/
│   ├── databases/
│   ├── networking/
│   └── security/
├── patterns/               # Design patterns
│   ├── high-availability/
│   ├── disaster-recovery/
│   └── performance/
├── providers/              # Provider-specific docs
│   ├── aws/
│   ├── upcloud/
│   └── hetzner/
└── best-practices/         # Best practices
    ├── security/
    ├── operations/
    └── cost-optimization/

Updating Knowledge

# Add new knowledge document
curl -X POST  [http://localhost:9091/v1/knowledge/add](http://localhost:9091/v1/knowledge/add) \
  -H "Content-Type: application/json" \
  -d '{
    "category": "kubernetes",
    "title": "HA Kubernetes Setup",
    "content": "..."
  }'

# Update embeddings
curl -X POST  [http://localhost:9091/v1/knowledge/reindex](http://localhost:9091/v1/knowledge/reindex)

# Get knowledge status
curl  [http://localhost:9091/v1/knowledge/status](http://localhost:9091/v1/knowledge/status)

DAG Execution

Workflow Planning

The AI Service uses DAGs to plan complex infrastructure deployments:

Validate Config
    ├→ Create Network
    │   └→ Create Nodes
    │       └→ Install Kubernetes
    │           ├→ Add Monitoring (optional)
    │           └→ Setup Backup (optional)
    │
    └→ Verify Compatibility
        └→ Estimate Costs

Task Execution

# Execute DAG workflow
curl -X POST  [http://localhost:9091/v1/workflow/execute](http://localhost:9091/v1/workflow/execute) \
  -H "Content-Type: application/json" \
  -d '{
    "dag": {
      "tasks": [
        {"name": "validate", "action": "validate_config"},
        {"name": "network", "action": "create_network", "depends_on": ["validate"]},
        {"name": "nodes", "action": "create_nodes", "depends_on": ["network"]}
      ]
    }
  }'

Performance Characteristics

Latency

OperationLatency
Intent recognition50-100ms
Knowledge retrieval100-200ms
LLM inference2-5 seconds
Nickel generation500ms-1s
DAG planning100-500ms
Policy generation1-2 seconds

Throughput

  • Concurrent requests: 100+
  • QPS: 50+ requests/second
  • Knowledge search: <50ms for 1000+ documents

Resource Usage

  • Memory: 500MB-2GB (with cache)
  • CPU: 1-4 cores
  • Storage: 10GB-50GB (knowledge base)
  • Network: 10Mbps-100Mbps (LLM requests)

Monitoring & Observability

Metrics

# Prometheus metrics exposed at /metrics
provisioning_ai_requests_total{endpoint="/v1/infrastructure/create"}
provisioning_ai_request_duration_seconds{endpoint="/v1/infrastructure/create"}
provisioning_ai_llm_tokens{provider="openai", model="gpt-4"}
provisioning_ai_knowledge_documents_total
provisioning_ai_cache_hit_ratio

Logging

# View AI Service logs
provisioning logs service ai-service --tail 100

# Debug mode
PROVISIONING_AI_LOG_LEVEL=debug provisioning service start ai-service

Troubleshooting

LLM Connection Issues

# Test LLM connection
curl -X POST  [http://localhost:9091/v1/health](http://localhost:9091/v1/health)

# Check configuration
provisioning config get ai.llm

# View logs
provisioning logs service ai-service --filter "llm| \ openai"

Slow Knowledge Retrieval

# Check knowledge base status
curl  [http://localhost:9091/v1/knowledge/status](http://localhost:9091/v1/knowledge/status)

# Reindex embeddings
curl -X POST  [http://localhost:9091/v1/knowledge/reindex](http://localhost:9091/v1/knowledge/reindex)

# Monitor RAG performance
curl  [http://localhost:9091/v1/rag/benchmark](http://localhost:9091/v1/rag/benchmark)

RAG & Knowledge Base

The RAG (Retrieval Augmented Generation) system enhances AI-generated infrastructure with domain-specific knowledge. It retrieves relevant documentation, best practices, and patterns to inform infrastructure recommendations.

Architecture

Components

User Query
    ↓
Query Embedder (text-embedding-3-small)
    ↓
Vector Similarity Search (SurrealDB)
    ↓
Knowledge Retrieval (semantic matching)
    ↓
Context Augmentation
    ↓
LLM Processing (with knowledge context)
    ↓
Infrastructure Recommendation

Knowledge Flow

Documentation Input
    ↓
Document Chunking (512 tokens)
    ↓
Semantic Embedding
    ↓
Vector Storage (SurrealDB)
    ↓
Similarity Indexing
    ↓
Query Time Retrieval

Knowledge Base Organization

Document Categories

CategoryPurposeExamples
InfrastructureIaC patterns and templatesKubernetes, databases, networking
Best PracticesOperational guidelinesHA patterns, disaster recovery
Provider GuidesCloud provider documentationAWS, UpCloud, Hetzner specifics
PerformanceOptimization guidelinesResource sizing, caching strategies
SecuritySecurity hardening guidesEncryption, authentication, compliance
TroubleshootingCommon issues and solutionsPerformance, deployment, debugging

Document Structure

id: "doc-k8s-ha-001"
category: "infrastructure"
subcategory: "kubernetes"
title: "High Availability Kubernetes Cluster Setup"
tags: ["kubernetes", "high-availability", "production"]
created: "2026-01-10T00:00:00Z"
updated: "2026-01-16T00:00:00Z"

content: |
  # High Availability Kubernetes Cluster

  For production Kubernetes deployments, ensure:
  - Minimum 3 control planes
  - Distributed across availability zones
  - etcd with persistent storage
  - CNI plugin with network policies

embedding: [0.123, 0.456]
metadata:
  provider: ["aws", "upcloud", "hetzner"]
  environment: ["production"]
  cost_profile: "medium"

RAG Retrieval Process

When processing a user query, the system:

  1. Embed Query: Convert natural language to vector
  2. Search Index: Find similar documents (cosine similarity > threshold)
  3. Rank Results: Score by relevance
  4. Extract Context: Select top N chunks
  5. Augment Prompt: Add context to LLM request

Example:

User Query: "Create a Kubernetes cluster in AWS with auto-scaling"

Vector Embedding: [0.234, 0.567, 0.891]

Top Matches:
1. "HA Kubernetes Setup" (similarity: 0.94)
2. "AWS Auto-Scaling Patterns" (similarity: 0.87)
3. "Kubernetes Security Hardening" (similarity: 0.76)

Retrieved Context:
- Minimum 3 control planes for HA
- Use AWS ASGs with cluster autoscaler
- Enable Pod Disruption Budgets
- Configure network policies

LLM Prompt with Context:
"Create a Kubernetes cluster with the following context:
[...retrieved knowledge...]
User request: Create a Kubernetes cluster in AWS with auto-scaling"

Configuration

[rag]
enabled = true
similarity_threshold = 0.75
max_results = 5
chunk_size = 512
chunk_overlap = 50

[embeddings]
model = "text-embedding-3-small"
provider = "openai"
cache_embeddings = true

[vector_store]
backend = "surrealdb"
index_type = "hnsw"
ef_construction = 400
ef_search = 200

[retrieval]
bm25_weight = 0.3
semantic_weight = 0.7
date_boost = 0.1

Managing Knowledge

Adding Documents

Via API:

curl -X POST  [http://localhost:9091/v1/knowledge/add](http://localhost:9091/v1/knowledge/add) \
  -H "Content-Type: application/json" \
  -d '{
    "category": "infrastructure",
    "title": "PostgreSQL HA Setup",
    "content": "For production PostgreSQL: 3+ replicas, streaming replication",
    "tags": ["database", "postgresql", "ha"],
    "metadata": {
      "provider": ["aws", "upcloud"],
      "environment": ["production"]
    }
  }'

Batch Import:

# Import from markdown files
provisioning ai knowledge import \
  --source ./docs/knowledge \
  --category infrastructure \
  --auto-tag

# Import from existing documentation
provisioning ai knowledge import \
  --source provisioning/docs/src \
  --recursive

Organizing Knowledge

# List knowledge documents
provisioning ai knowledge list --category infrastructure

# Search knowledge base
provisioning ai knowledge search "kubernetes high availability"

# View document
provisioning ai knowledge view doc-k8s-ha-001

# Update document
provisioning ai knowledge update doc-k8s-ha-001 \
  --content "Updated content..." \
  --tags "kubernetes,ha,production,v1.28"

# Delete document
provisioning ai knowledge delete doc-k8s-ha-001

Reindexing

# Reindex all documents
provisioning ai knowledge reindex --all

# Reindex specific category
provisioning ai knowledge reindex --category infrastructure

# Check indexing status
provisioning ai knowledge index-status

# Rebuild vector index
provisioning ai knowledge rebuild-vectors --model text-embedding-3-small

Knowledge Query API

Search Endpoint

POST /v1/knowledge/search
Content-Type: application/json

{
  "query": "kubernetes cluster setup",
  "category": "infrastructure",
  "tags": ["kubernetes"],
  "limit": 5,
  "similarity_threshold": 0.75,
  "metadata_filter": {
    "provider": ["aws", "upcloud"],
    "environment": ["production"]
  }
}

Response:

{
  "results": [
    {
      "id": "doc-k8s-ha-001",
      "title": "High Availability Kubernetes Cluster",
      "category": "infrastructure",
      "similarity": 0.94,
      "excerpt": "For production Kubernetes deployments, ensure minimum 3 control planes",
      "tags": ["kubernetes", "ha", "production"],
      "metadata": {
        "provider": ["aws", "upcloud", "hetzner"],
        "environment": ["production"]
      }
    }
  ],
  "search_time_ms": 45,
  "total_matches": 12
}

Knowledge Quality

Maintenance

# Check knowledge quality
provisioning ai knowledge quality-report

# Remove duplicate documents
provisioning ai knowledge deduplicate

# Fix broken references
provisioning ai knowledge validate-refs

# Update outdated docs
provisioning ai knowledge mark-outdated \
  --category infrastructure \
  --older-than 180d

Metrics

# Knowledge base statistics
curl  [http://localhost:9091/v1/knowledge/stats](http://localhost:9091/v1/knowledge/stats)

Response:

{
  "total_documents": 1250,
  "total_chunks": 8432,
  "categories": {
    "infrastructure": 450,
    "security": 200,
    "best_practices": 300
  },
  "embedding_coverage": 0.98,
  "indexed_chunks": 8256,
  "vector_index_size_mb": 245,
  "last_reindex": "2026-01-15T23:00:00Z"
}

RAG uses hybrid search combining semantic and keyword matching:

BM25 Score (Keyword Match): 0.7
Semantic Score (Vector Similarity): 0.92

Hybrid Score = (0.3 × 0.7) + (0.7 × 0.92)
             = 0.21 + 0.644
             = 0.854

Relevance: High ✓

Configuration

[hybrid_search]
bm25_weight = 0.3
semantic_weight = 0.7

Performance

Retrieval Latency

OperationLatency
Embed query (512 tokens)100-200ms
Vector similarity search20-50ms
BM25 keyword search10-30ms
Hybrid ranking5-10ms
Total retrieval50-100ms

Vector Index Size

  • Documents: 1000 → 8GB storage
  • Documents: 10000 → 80GB storage
  • Search latency: Consistent <50ms regardless of size (with HNSW indexing)

Security & Privacy

Access Control

# Restrict knowledge access
provisioning ai knowledge acl set doc-k8s-ha-001 \
  --read "admin,developer" \
  --write "admin"

# Audit knowledge access
provisioning ai knowledge audit --document doc-k8s-ha-001

Data Protection

  • Sensitive Info: Automatically redacted from queries (API keys, passwords)
  • Document Encryption: Optional at-rest encryption
  • Query Logging: Audit trail for compliance
[security]
redact_patterns = ["password", "api_key", "secret"]
encrypt_documents = true
audit_queries = true

Natural Language Infrastructure

Use natural language to describe infrastructure requirements and get automatically generated Nickel configurations and deployment plans.

Overview

Natural Language Infrastructure (NLI) allows requesting infrastructure changes in plain English:

# Instead of writing complex Nickel...
provisioning ai "Deploy a 3-node HA PostgreSQL cluster with automatic backups in AWS"

# Or interactively...
provisioning ai interactive

# Interactive mode guides you through requirements

How It Works

Request Processing Pipeline

User Natural Language Input
    ↓
Intent Recognition
    ├─ Extract resource type (server, database, cluster)
    ├─ Identify constraints (HA, region, size)
    └─ Detect options (monitoring, backup, encryption)
    ↓
RAG Knowledge Retrieval
    ├─ Find similar deployments
    ├─ Retrieve best practices
    └─ Get provider-specific guidance
    ↓
LLM Inference (GPT-4, Claude 3)
    ├─ Generate Nickel schema
    ├─ Calculate resource requirements
    └─ Create deployment plan
    ↓
Configuration Validation
    ├─ Type checking via Nickel compiler
    ├─ Schema validation
    └─ Constraint verification
    ↓
Infrastructure Deployment
    ├─ Dry-run simulation
    ├─ Cost estimation
    └─ User confirmation
    ↓
Execution & Monitoring

Command Usage

Simple Requests

# Web servers with load balancing
provisioning ai "Create 3 web servers with load balancer"

# Database setup
provisioning ai "Deploy PostgreSQL with 2 replicas and daily backups"

# Kubernetes cluster
provisioning ai "Create production Kubernetes cluster with Prometheus monitoring"

Complex Requests

# Multi-cloud deployment
provisioning ai "
  Deploy:
  - 3 HA Kubernetes clusters (AWS, UpCloud, Hetzner)
  - PostgreSQL 15 with synchronous replication
  - Redis cluster for caching
  - ELK stack for logging
  - Prometheus for monitoring
  Constraints:
  - Cross-region high availability
  - Encrypted inter-region communication
  - Auto-scaling based on CPU (70%)
"

# Disaster recovery setup
provisioning ai "
  Set up disaster recovery for production environment:
  - Active-passive failover to secondary region
  - Daily automated backups (30-day retention)
  - Monthly DR tests with automated reports
  - RTO: 4 hours, RPO: 1 hour
  - Test failover every week
"

Interactive Mode

# Start interactive mode
provisioning ai interactive

# System asks clarifying questions:
# Q: What type of infrastructure? (server, database, cluster, other)
# Q: Which cloud provider? (aws, upcloud, hetzner, local)
# Q: Production or development?
# Q: High availability required?
# Q: Expected load? (small, medium, large, enterprise)
# Q: Monitoring and logging?
# Q: Backup strategy?

# Shows generated configuration for approval

Example: Web Application Deployment

Request

provisioning ai "
  Deploy a production web application:
  - Frontend: 3 nginx servers with auto-scaling
  - API: 5 FastAPI instances behind load balancer
  - Database: HA PostgreSQL with read replicas
  - Cache: Redis cluster
  - Monitoring: Prometheus and Grafana
  - Logs: Elasticsearch + Kibana
  Environment: AWS
  Region: us-east-1 (primary), eu-west-1 (DR)
  Budget: $5000/month max
"

Generated Configuration

The system generates:

1. Nickel Infrastructure Definition:

{
  metadata = {
    generated_by = "provisioning-ai"
    timestamp = "2026-01-16T01:47:00Z"
    confidence = 0.94
  }

  infrastructure = {
    frontend = {
      servers = 3
      cpu = 4
      memory = 8
      type = "t3.large"
      auto_scaling = {
        min = 3
        max = 10
        target_cpu = 70
      }
    }

    api = {
      servers = 5
      cpu = 8
      memory = 16
      type = "t3.xlarge"
      load_balancer = {
        type = "application"
        health_check = "/health"
      }
    }

    database = {
      type = "postgresql"
      version = "15"
      replicas = 2
      backup = {
        enabled = true
        frequency = "daily"
        retention_days = 30
      }
    }

    cache = {
      type = "redis"
      nodes = 3
      cluster_mode = true
      replication = true
    }

    monitoring = {
      prometheus = true
      grafana = true
      retention_days = 30
    }

    logging = {
      elasticsearch = true
      kibana = true
      retention_days = 30
    }
  }

  provider = "aws"
  region_primary = "us-east-1"
  region_dr = "eu-west-1"

  cost_estimate = {
    monthly = "$4850"
    breakdown = {
      compute = "$2500"
      database = "$1200"
      cache = "$600"
      monitoring = "$400"
      networking = "$150"
    }
  }
}

2. Deployment Plan:

Deployment Plan: Web Application (Production)

Phase 1: Network & Storage (2-3 hours)
  - Create VPCs in us-east-1 and eu-west-1
  - Set up inter-region VPN
  - Create EBS volumes for database
  - Create EFS for shared storage

Phase 2: Compute Instances (4-5 hours)
  - Launch 3 frontend servers
  - Launch 5 API servers
  - Create load balancers
  - Set up auto-scaling groups

Phase 3: Databases (3-4 hours)
  - Create PostgreSQL primary
  - Create read replicas
  - Configure replication
  - Run initial backup

Phase 4: Cache & Services (2-3 hours)
  - Create Redis cluster
  - Deploy Prometheus
  - Deploy Grafana
  - Deploy Elasticsearch/Kibana

Phase 5: Configuration (2-3 hours)
  - Configure health checks
  - Set up monitoring alerts
  - Configure log shipping
  - Deploy TLS certificates

Total Estimated Time: 13-18 hours

3. Cost Breakdown:

Monthly Cost Estimate: $4,850

Compute               $2,500  (EC2 instances)
Database              $1,200  (RDS PostgreSQL)
Cache                 $600    (ElastiCache Redis)
Monitoring            $400    (CloudWatch + Grafana)
Networking            $150    (NAT Gateway, VPN)

4. Risk Assessment:

Warnings:
- Budget limit reached at $4,850 (max: $5,000)
- Cross-region networking latency: 80-100ms
- Database failover time: 1-2 minutes

Recommendations:
- Implement connection pooling in API
- Use read replicas for analytics queries
- Consider spot instances for non-critical services (30% cost savings)

Output Formats

Get Deployment Script

# Get Bash deployment script
provisioning ai "..." --output bash > deploy.sh

# Get Nushell script
provisioning ai "..." --output nushell > deploy.nu

# Get Terraform
provisioning ai "..." --output terraform > main.tf

# Get Nickel (default)
provisioning ai "..." --output nickel > infrastructure.ncl

Save for Later

# Save configuration for review
provisioning ai "..." --save deployment-plan --review

# Deploy from saved plan
provisioning apply deployment-plan

# Compare with current state
provisioning diff deployment-plan

Configuration

LLM Provider Selection

# Use OpenAI (default)
export PROVISIONING_AI_PROVIDER=openai
export PROVISIONING_AI_MODEL=gpt-4

# Use Anthropic
export PROVISIONING_AI_PROVIDER=anthropic
export PROVISIONING_AI_MODEL=claude-3-opus

# Use local model
export PROVISIONING_AI_PROVIDER=local
export PROVISIONING_AI_MODEL=llama2:70b

Response Options

# ~/.config/provisioning/ai.yaml
natural_language:
  output_format: nickel              # nickel, terraform, bash, nushell
  include_cost_estimate: true
  include_risk_assessment: true
  include_deployment_plan: true
  auto_review: false                 # Require approval before deploy
  dry_run: true                       # Simulate before execution
  confidence_threshold: 0.85          # Reject low-confidence results

  style:
    verbosity: detailed
    include_alternatives: true
    explain_reasoning: true

Advanced Features

Conditional Infrastructure

provisioning ai "
  Deploy web cluster:
  - If environment is production: HA setup with 5 nodes
  - If environment is staging: Standard setup with 2 nodes
  - If environment is dev: Single node with development tools
"

Cost-Optimized Variants

# Generate cost-optimized alternative
provisioning ai "..." --optimize-for cost

# Generate performance-optimized alternative
provisioning ai "..." --optimize-for performance

# Generate high-availability alternative
provisioning ai "..." --optimize-for availability

Template-Based Generation

# Use existing templates as base
provisioning ai "..." --template kubernetes-ha

# List available templates
provisioning ai templates list

Safety & Validation

Review Before Deploy

# Generate and review (no auto-execute)
provisioning ai "..." --review

# Review generated Nickel
cat deployment-plan.ncl

# Validate configuration
provisioning validate deployment-plan.ncl

# Dry-run to see what changes
provisioning apply --dry-run deployment-plan.ncl

# Apply after approval
provisioning apply deployment-plan.ncl

Rollback Support

# Create deployment with automatic rollback
provisioning ai "..." --with-rollback

# Manual rollback if issues
provisioning workflow rollback --to-checkpoint

# View deployment history
provisioning history list --type infrastructure

Limitations

  • Context Window: Very large infrastructure descriptions may exceed LLM limits
  • Ambiguity: Unclear requirements may produce suboptimal configurations
  • Provider Specifics: Some provider-specific features may require manual adjustment
  • Cost: API calls incur per-token charges
  • Latency: Processing takes 2-10 seconds depending on complexity