1124 lines
39 KiB
Markdown
1124 lines
39 KiB
Markdown
<p align="center">
|
||
<img src="resources/provisioning_logo.svg" alt="Provisioning Logo" width="300"/>
|
||
</p>
|
||
<p align="center">
|
||
<img src="resources/logo-text.svg" alt="Provisioning" width="500"/>
|
||
</p>
|
||
|
||
# Provisioning - Infrastructure Automation Platform
|
||
|
||
> **A modular, declarative Infrastructure as Code (IaC) platform for managing complete infrastructure lifecycles**
|
||
|
||
## Table of Contents
|
||
|
||
- [What is Provisioning?](#what-is-provisioning)
|
||
- [Why Provisioning?](#why-provisioning)
|
||
- [Core Concepts](#core-concepts)
|
||
- [Architecture](#architecture)
|
||
- [Key Features](#key-features)
|
||
- [Technology Stack](#technology-stack)
|
||
- [How It Works](#how-it-works)
|
||
- [Use Cases](#use-cases)
|
||
- [Getting Started](#getting-started)
|
||
|
||
---
|
||
|
||
## What is Provisioning
|
||
|
||
**Provisioning** is a comprehensive **Infrastructure as Code (IaC)** platform designed to manage
|
||
complete infrastructure lifecycles: cloud providers, infrastructure services, clusters,
|
||
and isolated workspaces across multiple cloud/local environments.
|
||
|
||
Extensible and customizable by design, it delivers type-safe, configuration-driven workflows
|
||
with enterprise security (encrypted configuration, Cosmian KMS integration, Cedar policy engine,
|
||
secrets management, authorization and permissions control, compliance checking, anomaly detection)
|
||
and adaptable deployment modes (interactive UI, CLI automation, unattended CI/CD)
|
||
suitable for any scale from development to production.
|
||
|
||
### Technical Definition
|
||
|
||
Declarative Infrastructure as Code (IaC) platform providing:
|
||
|
||
- **Type-safe, configuration-driven workflows** with schema validation and constraint checking
|
||
- **Modular, extensible architecture**: cloud providers, task services, clusters, workspaces
|
||
- **Multi-cloud abstraction layer** with unified API (UpCloud, AWS, local infrastructure)
|
||
- **High-performance state management**:
|
||
- Graph database backend for complex relationships
|
||
- Real-time state tracking and queries
|
||
- Multi-model data storage (document, graph, relational)
|
||
- **Enterprise security stack**:
|
||
- Encrypted configuration and secrets management
|
||
- Cosmian KMS integration for confidential key management
|
||
- Cedar policy engine for fine-grained access control
|
||
- Authorization and permissions control via platform services
|
||
- Compliance checking and policy enforcement
|
||
- Anomaly detection for security monitoring
|
||
- Audit logging and compliance tracking
|
||
- **Hybrid orchestration**: Rust-based performance layer + scripting flexibility
|
||
- **Production-ready features**:
|
||
- Batch workflows with dependency resolution
|
||
- Checkpoint recovery and automatic rollback
|
||
- Parallel execution with state management
|
||
- **Adaptable deployment modes**:
|
||
- Interactive TUI for guided setup
|
||
- Headless CLI for scripted automation
|
||
- Unattended mode for CI/CD pipelines
|
||
- **Hierarchical configuration system** with inheritance and overrides
|
||
|
||
### What It Does
|
||
|
||
- **Provisions Infrastructure** - Create servers, networks, storage across multiple cloud providers
|
||
- **Installs Services** - Deploy Kubernetes, containerd, databases, monitoring, and 50+ infrastructure components
|
||
- **Manages Clusters** - Orchestrate complete cluster deployments with dependency management
|
||
- **Handles Configuration** - Hierarchical configuration system with inheritance and overrides
|
||
- **Orchestrates Workflows** - Batch operations with parallel execution and checkpoint recovery
|
||
- **Manages Secrets** - SOPS/Age integration for encrypted configuration
|
||
- **Secures Infrastructure** - Enterprise security with JWT, MFA, Cedar policies, audit logging
|
||
- **Optimizes Performance** - Native plugins providing 10-50x speed improvements
|
||
|
||
---
|
||
|
||
## Why Provisioning
|
||
|
||
### The Problems It Solves
|
||
|
||
#### 1. **Multi-Cloud Complexity**
|
||
|
||
**Problem**: Each cloud provider has different APIs, tools, and workflows.
|
||
|
||
**Solution**: Unified abstraction layer with provider-agnostic interfaces. Write configuration once, deploy anywhere using Nickel schemas.
|
||
|
||
```nickel
|
||
# Same configuration works on UpCloud, AWS, or local infrastructure
|
||
{
|
||
servers = [
|
||
{
|
||
name = "web-01"
|
||
plan = "medium" # Abstract size, provider-specific translation
|
||
provider = "upcloud" # Switch to "aws" or "local" as needed
|
||
}
|
||
]
|
||
}
|
||
```
|
||
|
||
#### 2. **Dependency Hell**
|
||
|
||
**Problem**: Infrastructure components have complex dependencies (Kubernetes needs containerd, Cilium needs Kubernetes, etc.).
|
||
|
||
**Solution**: Automatic dependency resolution with topological sorting and health checks via Nickel schemas.
|
||
|
||
```nickel
|
||
# Provisioning resolves: containerd → etcd → kubernetes → cilium
|
||
{
|
||
taskservs = ["cilium"] # Automatically installs all dependencies
|
||
}
|
||
```
|
||
|
||
#### 3. **Configuration Sprawl**
|
||
|
||
**Problem**: Environment variables, hardcoded values, scattered configuration files.
|
||
|
||
**Solution**: Hierarchical configuration system with 476+ config accessors replacing 200+ ENV variables.
|
||
|
||
```toml
|
||
Defaults → User → Project → Infrastructure → Environment → Runtime
|
||
```
|
||
|
||
#### 4. **Imperative Scripts**
|
||
|
||
**Problem**: Brittle shell scripts that don't handle failures, don't support rollback, hard to maintain.
|
||
|
||
**Solution**: Declarative Nickel configurations with validation, type safety, lazy evaluation, and automatic rollback.
|
||
|
||
#### 5. **Lack of Visibility**
|
||
|
||
**Problem**: No insight into what's happening during deployment, hard to debug failures.
|
||
|
||
**Solution**:
|
||
|
||
- Real-time workflow monitoring
|
||
- Comprehensive logging system
|
||
- Web-based control center
|
||
- REST API for integration
|
||
|
||
#### 6. **No Standardization**
|
||
|
||
**Problem**: Each team builds their own deployment tools, no shared patterns.
|
||
|
||
**Solution**: Reusable task services, cluster templates, and workflow patterns.
|
||
|
||
---
|
||
|
||
## Core Concepts
|
||
|
||
### 1. **Providers**
|
||
|
||
Cloud infrastructure backends that handle resource provisioning.
|
||
|
||
- **UpCloud** - Primary cloud provider
|
||
- **AWS** - Amazon Web Services integration
|
||
- **Local** - Local infrastructure (VMs, Docker, bare metal)
|
||
|
||
Providers implement a common interface, making infrastructure code portable.
|
||
|
||
### 2. **Task Services (TaskServs)**
|
||
|
||
Reusable infrastructure components that can be installed on servers.
|
||
|
||
**Categories**:
|
||
|
||
- **Container Runtimes** - containerd, Docker, Podman, crun, runc, youki
|
||
- **Orchestration** - Kubernetes, etcd, CoreDNS
|
||
- **Networking** - Cilium, Flannel, Calico, ip-aliases
|
||
- **Storage** - Rook-Ceph, local storage
|
||
- **Databases** - PostgreSQL, Redis, SurrealDB
|
||
- **Observability** - Prometheus, Grafana, Loki
|
||
- **Security** - Webhook, KMS, Vault
|
||
- **Development** - Gitea, Radicle, ORAS
|
||
|
||
Each task service includes:
|
||
|
||
- Version management
|
||
- Dependency declarations
|
||
- Health checks
|
||
- Installation/uninstallation logic
|
||
- Configuration schemas
|
||
|
||
### 3. **Clusters**
|
||
|
||
Complete infrastructure deployments combining servers and task services.
|
||
|
||
**Examples**:
|
||
|
||
- **Kubernetes Cluster** - HA control plane + worker nodes + CNI + storage
|
||
- **Database Cluster** - Replicated PostgreSQL with backup
|
||
- **Build Infrastructure** - BuildKit + container registry + CI/CD
|
||
|
||
Clusters handle:
|
||
|
||
- Multi-node coordination
|
||
- Service distribution
|
||
- High availability
|
||
- Rolling updates
|
||
|
||
### 4. **Workspaces**
|
||
|
||
Isolated environments for different projects or deployment stages.
|
||
|
||
```bash
|
||
workspace_librecloud/ # Production workspace
|
||
├── infra/ # Infrastructure definitions
|
||
├── config/ # Workspace configuration
|
||
├── extensions/ # Custom modules
|
||
└── runtime/ # State and runtime data
|
||
|
||
workspace_dev/ # Development workspace
|
||
├── infra/
|
||
└── config/
|
||
```
|
||
|
||
Switch between workspaces with single command:
|
||
|
||
```bash
|
||
provisioning workspace switch librecloud
|
||
```
|
||
|
||
### 5. **Workflows**
|
||
|
||
Coordinated sequences of operations with dependency management.
|
||
|
||
**Types**:
|
||
|
||
- **Server Workflows** - Create/delete/update servers
|
||
- **TaskServ Workflows** - Install/remove infrastructure services
|
||
- **Cluster Workflows** - Deploy/scale complete clusters
|
||
- **Batch Workflows** - Multi-cloud parallel operations
|
||
|
||
**Features**:
|
||
|
||
- Dependency resolution
|
||
- Parallel execution
|
||
- Checkpoint recovery
|
||
- Automatic rollback
|
||
- Progress monitoring
|
||
|
||
---
|
||
|
||
## Architecture
|
||
|
||
### System Components
|
||
|
||
```bash
|
||
┌─────────────────────────────────────────────────────────────────┐
|
||
│ User Interface Layer │
|
||
│ • CLI (provisioning command) │
|
||
│ • Web Control Center (UI) │
|
||
│ • REST API │
|
||
└─────────────────────────────────────────────────────────────────┘
|
||
↓
|
||
┌─────────────────────────────────────────────────────────────────┐
|
||
│ Core Engine Layer │
|
||
│ • Command Routing & Dispatch │
|
||
│ • Configuration Management │
|
||
│ • Provider Abstraction │
|
||
│ • Utility Libraries │
|
||
└─────────────────────────────────────────────────────────────────┘
|
||
↓
|
||
┌─────────────────────────────────────────────────────────────────┐
|
||
│ Orchestration Layer │
|
||
│ • Workflow Orchestrator (Rust/Nushell hybrid) │
|
||
│ • Dependency Resolver │
|
||
│ • State Manager │
|
||
│ • Task Scheduler │
|
||
└─────────────────────────────────────────────────────────────────┘
|
||
↓
|
||
┌─────────────────────────────────────────────────────────────────┐
|
||
│ Extension Layer │
|
||
│ • Providers (Cloud APIs) │
|
||
│ • Task Services (Infrastructure Components) │
|
||
│ • Clusters (Complete Deployments) │
|
||
│ • Workflows (Automation Templates) │
|
||
└─────────────────────────────────────────────────────────────────┘
|
||
↓
|
||
┌─────────────────────────────────────────────────────────────────┐
|
||
│ Infrastructure Layer │
|
||
│ • Cloud Resources (Servers, Networks, Storage) │
|
||
│ • Kubernetes Clusters │
|
||
│ • Running Services │
|
||
└─────────────────────────────────────────────────────────────────┘
|
||
```
|
||
|
||
### Directory Structure
|
||
|
||
```bash
|
||
project-provisioning/
|
||
├── provisioning/ # Core provisioning system
|
||
│ ├── core/ # Core engine and libraries
|
||
│ │ ├── cli/ # Command-line interface
|
||
│ │ ├── nulib/ # Core Nushell libraries
|
||
│ │ ├── plugins/ # System plugins (Rust native)
|
||
│ │ └── scripts/ # Utility scripts
|
||
│ │
|
||
│ ├── extensions/ # Extensible components
|
||
│ │ ├── providers/ # Cloud provider implementations
|
||
│ │ ├── taskservs/ # Infrastructure service definitions
|
||
│ │ ├── clusters/ # Complete cluster configurations
|
||
│ │ └── workflows/ # Core workflow templates
|
||
│ │
|
||
│ ├── platform/ # Platform services
|
||
│ │ ├── orchestrator/ # Rust orchestrator service
|
||
│ │ ├── control-center/ # Web control center
|
||
│ │ ├── mcp-server/ # Model Context Protocol server
|
||
│ │ ├── api-gateway/ # REST API gateway
|
||
│ │ ├── oci-registry/ # OCI registry for extensions
|
||
│ │ └── installer/ # Platform installer (TUI + CLI)
|
||
│ │
|
||
│ ├── schemas/ # Nickel schema definitions (PRIMARY IaC)
|
||
│ │ ├── main.ncl # Main infrastructure schema
|
||
│ │ ├── providers/ # Provider-specific schemas
|
||
│ │ ├── infrastructure/ # Infra definitions
|
||
│ │ ├── deployment/ # Deployment schemas
|
||
│ │ ├── services/ # Service schemas
|
||
│ │ ├── operations/ # Operations schemas
|
||
│ │ └── generator/ # Runtime schema generation
|
||
│ │
|
||
│ ├── docs/ # Product documentation (mdBook)
|
||
│ ├── config/ # Configuration examples
|
||
│ ├── tools/ # Build and distribution tools
|
||
│ └── justfiles/ # Just recipes for common tasks
|
||
│
|
||
├── workspace/ # User workspaces and data
|
||
│ ├── infra/ # Infrastructure definitions
|
||
│ ├── config/ # User configuration
|
||
│ ├── extensions/ # User extensions
|
||
│ └── runtime/ # Runtime data and state
|
||
│
|
||
├── docs/ # Architecture & Development docs
|
||
│ ├── architecture/ # System design and ADRs
|
||
│ └── development/ # Development guidelines
|
||
│
|
||
└── .github/ # CI/CD workflows
|
||
└── workflows/ # GitHub Actions (Rust, Nickel, Nushell)
|
||
```
|
||
|
||
### Platform Services
|
||
|
||
#### 1. **Orchestrator** (`platform/orchestrator/`)
|
||
|
||
- **Language**: Rust + Nushell
|
||
- **Purpose**: Workflow execution, task scheduling, state management
|
||
- **Features**:
|
||
- File-based persistence
|
||
- Priority processing
|
||
- Retry logic with exponential backoff
|
||
- Checkpoint-based recovery
|
||
- REST API endpoints
|
||
|
||
#### 2. **Control Center** (`platform/control-center/`)
|
||
|
||
- **Language**: Web UI + Backend API
|
||
- **Purpose**: Web-based infrastructure management
|
||
- **Features**:
|
||
- Dashboard views
|
||
- Real-time monitoring
|
||
- Interactive deployments
|
||
- Log viewing
|
||
|
||
#### 3. **MCP Server** (`platform/mcp-server/`)
|
||
|
||
- **Language**: Nushell
|
||
- **Purpose**: Model Context Protocol integration for AI assistance
|
||
- **Features**:
|
||
- 7 AI-powered settings tools
|
||
- Intelligent config completion
|
||
- Natural language infrastructure queries
|
||
|
||
#### 4. **OCI Registry** (`platform/oci-registry/`)
|
||
|
||
- **Purpose**: Extension distribution and versioning
|
||
- **Features**:
|
||
- Task service packages
|
||
- Provider packages
|
||
- Cluster templates
|
||
- Workflow definitions
|
||
|
||
#### 5. **Installer** (`platform/installer/`)
|
||
|
||
- **Language**: Rust (Ratatui TUI) + Nushell
|
||
- **Purpose**: Platform installation and setup
|
||
- **Features**:
|
||
- Interactive TUI mode
|
||
- Headless CLI mode
|
||
- Unattended CI/CD mode
|
||
- Configuration generation
|
||
|
||
---
|
||
|
||
## Key Features
|
||
|
||
### 1. **Modular CLI Architecture** (v3.2.0)
|
||
|
||
84% code reduction with domain-driven design.
|
||
|
||
- **Main CLI**: 211 lines (from 1,329 lines)
|
||
- **80+ shortcuts**: `s` → `server`, `t` → `taskserv`, etc.
|
||
- **Bi-directional help**: `provisioning help ws` = `provisioning ws help`
|
||
- **7 domain modules**: infrastructure, orchestration, development, workspace, configuration, utilities, generation
|
||
|
||
### 2. **Configuration System** (v2.0.0)
|
||
|
||
Hierarchical, config-driven architecture.
|
||
|
||
- **476+ config accessors** replacing 200+ ENV variables
|
||
- **Hierarchical loading**: defaults → user → project → infra → env → runtime
|
||
- **Variable interpolation**: `{{paths.base}}`, `{{env.HOME}}`, `{{now.date}}`
|
||
- **Multi-format support**: TOML, YAML, KCL
|
||
|
||
### 3. **Batch Workflow System** (v3.1.0)
|
||
|
||
Provider-agnostic batch operations with 85-90% token efficiency.
|
||
|
||
- **Multi-cloud support**: Mixed UpCloud + AWS + local in single workflow
|
||
- **KCL schema integration**: Type-safe workflow definitions
|
||
- **Dependency resolution**: Topological sorting with soft/hard dependencies
|
||
- **State management**: Checkpoint-based recovery with rollback
|
||
- **Real-time monitoring**: Live progress tracking
|
||
|
||
### 4. **Hybrid Orchestrator** (v3.0.0)
|
||
|
||
Rust/Nushell architecture solving deep call stack limitations.
|
||
|
||
- **High-performance coordination layer**
|
||
- **File-based persistence**
|
||
- **Priority processing with retry logic**
|
||
- **REST API for external integration**
|
||
- **Comprehensive workflow system**
|
||
|
||
### 5. **Workspace Switching** (v2.0.5)
|
||
|
||
Centralized workspace management.
|
||
|
||
- **Single-command switching**: `provisioning workspace switch <name>`
|
||
- **Automatic tracking**: Last-used timestamps, active workspace markers
|
||
- **User preferences**: Global settings across all workspaces
|
||
- **Workspace registry**: Centralized configuration in `user_config.yaml`
|
||
|
||
### 6. **Interactive Guides** (v3.3.0)
|
||
|
||
Step-by-step walkthroughs and quick references.
|
||
|
||
- **Quick reference**: `provisioning sc` (fastest)
|
||
- **Complete guides**: from-scratch, update, customize
|
||
- **Copy-paste ready**: All commands include placeholders
|
||
- **Beautiful rendering**: Uses glow, bat, or less
|
||
|
||
### 7. **Test Environment Service** (v3.4.0)
|
||
|
||
Automated container-based testing.
|
||
|
||
- **Three test types**: Single taskserv, server simulation, multi-node clusters
|
||
- **Topology templates**: Kubernetes HA, etcd clusters, etc.
|
||
- **Auto-cleanup**: Optional automatic cleanup after tests
|
||
- **CI/CD integration**: Easy integration into pipelines
|
||
|
||
### 8. **Platform Installer** (v3.5.0)
|
||
|
||
Multi-mode installation system with TUI, CLI, and unattended modes.
|
||
|
||
- **Interactive TUI**: Beautiful Ratatui terminal UI with 7 screens
|
||
- **Headless Mode**: CLI automation for scripted installations
|
||
- **Unattended Mode**: Zero-interaction CI/CD deployments
|
||
- **Deployment Modes**: Solo (2 CPU/4GB), MultiUser (4 CPU/8GB), CICD (8 CPU/16GB), Enterprise (16 CPU/32GB)
|
||
- **MCP Integration**: 7 AI-powered settings tools for intelligent configuration
|
||
|
||
### 9. **Version Management System** (v3.6.0)
|
||
|
||
Centralized tool and provider version management with bash-compatible export.
|
||
|
||
- **Unified Version Source**: All versions defined in Nickel files (`versions.ncl` and provider `version.ncl`)
|
||
- **Generated Versions File**: Bash-compatible KEY="VALUE" format for shell scripts
|
||
- **Core Tools**: NUSHELL, NICKEL, SOPS, AGE, K9S with convenient aliases (NU for NUSHELL)
|
||
- **Provider Versions**: Automatically discovers and includes all provider versions (AWS, HCLOUD, UPCTL, etc.)
|
||
- **Command**: `provisioning setup versions` generates `/provisioning/core/versions` file
|
||
- **Shell Integration**: Can be sourced directly in bash scripts: `source /provisioning/core/versions && echo $NU_VERSION`
|
||
- **Usage**:
|
||
```bash
|
||
# Generate versions file
|
||
provisioning setup versions
|
||
|
||
# Use in bash scripts
|
||
source /provisioning/core/versions
|
||
echo "Using Nushell version: $NU_VERSION"
|
||
echo "AWS CLI version: $PROVIDER_AWS_VERSION"
|
||
```
|
||
|
||
### 10. **Nushell Plugins Integration** (v1.0.0)
|
||
|
||
Three native Rust plugins providing 10-50x performance improvements over HTTP API.
|
||
|
||
- **Three Native Plugins**: auth, KMS, orchestrator
|
||
- **Performance Gains**:
|
||
- KMS operations: ~5ms vs ~50ms (10x faster)
|
||
- Orchestrator queries: ~1ms vs ~30ms (30x faster)
|
||
- Auth verification: ~10ms vs ~50ms (5x faster)
|
||
- **OS-Native Keyring**: macOS Keychain, Linux Secret Service, Windows Credential Manager
|
||
- **KMS Backends**: RustyVault, Age, AWS KMS, Vault, Cosmian
|
||
- **Graceful Fallback**: Automatic fallback to HTTP if plugins not installed
|
||
|
||
### 11. **Complete Security System** (v4.0.0)
|
||
|
||
Enterprise-grade security with 39,699 lines across 12 components.
|
||
|
||
- **12 Components**: JWT Auth, Cedar Authorization, MFA (TOTP + WebAuthn), Secrets Management,
|
||
KMS, Audit Logging, Break-Glass, Compliance, Audit Query, Token Management, Access Control, Encryption
|
||
- **Performance**: <20ms overhead per secure operation
|
||
- **Testing**: 350+ comprehensive test cases
|
||
- **API**: 83+ REST endpoints, 111+ CLI commands
|
||
- **Standards**: GDPR, SOC2, ISO 27001 compliance
|
||
- **Key Features**:
|
||
- RS256 authentication with Argon2id hashing
|
||
- Policy-as-code with hot reload
|
||
- Multi-factor authentication (TOTP + WebAuthn/FIDO2)
|
||
- Dynamic secrets (AWS STS, SSH keys) with TTL
|
||
- 5 KMS backends with envelope encryption
|
||
- 7-year audit retention with 5 export formats
|
||
- Multi-party break-glass approval
|
||
|
||
---
|
||
|
||
## Technology Stack
|
||
|
||
### Core Technologies
|
||
|
||
| Technology | Version | Purpose | Why |
|
||
| ------------ | --------- | --------- | ----- |
|
||
| **Nickel** | Latest | PRIMARY - Infrastructure-as-code language | Type-safe schemas, lazy evaluation, LSP support, composable records |
|
||
| **Nushell** | 0.109.0+ | Scripting and task automation | Structured data pipelines, cross-platform, modern built-in parsers (JSON/YAML/TOML) |
|
||
| **Rust** | Latest | Platform services (orchestrator, control-center, installer) | Performance, memory safety, concurrency, reliability |
|
||
| **KCL** | DEPRECATED | Legacy configuration (fully replaced by Nickel) | Migration bridge available; use Nickel for new work |
|
||
|
||
### Data & State Management
|
||
|
||
| Technology | Version | Purpose | Features |
|
||
| ------------ | --------- | --------- | ---------- |
|
||
| **SurrealDB** | Latest | High-performance graph database backend | Multi-model (document, graph, relational), real-time queries, distributed |
|
||
|
||
### Platform Services (Rust-based)
|
||
|
||
| Service | Purpose | Security Features |
|
||
| --------- | --------- | ------------------- |
|
||
| **Orchestrator** | Workflow execution, task scheduling, state management | File-based persistence, retry logic, checkpoint recovery |
|
||
| **Control Center** | Web-based infrastructure management | **Authorization and permissions control**, RBAC, audit logging |
|
||
| **Installer** | Platform installation (TUI + CLI modes) | Secure configuration generation, validation |
|
||
| **API Gateway** | REST API for external integration | Authentication, rate limiting, request validation |
|
||
| **MCP Server** | AI-powered configuration management | 7 settings tools, intelligent config completion |
|
||
| **OCI Registry** | Extension distribution and versioning | Task services, providers, cluster templates |
|
||
|
||
### Security & Secrets
|
||
|
||
| Technology | Version | Purpose | Enterprise Features |
|
||
| ------------ | --------- | --------- | --------------------- |
|
||
| **SOPS** | 3.10.2+ | Secrets management | Encrypted configuration files |
|
||
| **Age** | 1.2.1+ | Encryption | Secure key-based encryption |
|
||
| **Cosmian KMS** | Latest | Key Management System | Confidential computing, secure key storage, cloud-native KMS |
|
||
| **Cedar** | Latest | Policy engine | Fine-grained access control, policy-as-code, compliance checking, anomaly detection |
|
||
| **RustyVault** | Latest | Transit encryption engine | 5ms encryption performance, multiple KMS backends |
|
||
| **JWT** | Latest | Authentication tokens | RS256 signatures, Argon2id password hashing |
|
||
| **Keyring** | Latest | OS-native secure storage | macOS Keychain, Linux Secret Service, Windows Credential Manager |
|
||
|
||
### Version Management
|
||
|
||
| Component | Purpose | Format |
|
||
| ----------- | --------- | -------- |
|
||
| **versions.ncl** | Core tool versions (Nickel primary) | Nickel schema |
|
||
| **provider version.ncl** | Provider-specific versions | Nickel schema |
|
||
| **provisioning setup versions** | Version file generator | Nushell command |
|
||
| **versions file** | Bash-compatible exports | KEY="VALUE" format |
|
||
|
||
**Usage**:
|
||
```bash
|
||
# Generate versions file from Nickel schemas
|
||
provisioning setup versions
|
||
|
||
# Source in shell scripts
|
||
source /provisioning/core/versions
|
||
echo $NU_VERSION $PROVIDER_AWS_VERSION
|
||
```
|
||
|
||
### Optional Tools
|
||
|
||
| Tool | Purpose |
|
||
| ------ | --------- |
|
||
| **K9s** | Kubernetes management interface |
|
||
| **nu_plugin_tera** | Nushell plugin for Tera template rendering |
|
||
| **nu_plugin_kcl** | Nushell plugin for KCL integration (CLI required, plugin optional) |
|
||
| **nu_plugin_auth** | Authentication plugin (5x faster auth, OS keyring integration) |
|
||
| **nu_plugin_kms** | KMS encryption plugin (10x faster, 5ms encryption) |
|
||
| **nu_plugin_orchestrator** | Orchestrator plugin (30-50x faster queries) |
|
||
| **glow** | Markdown rendering for interactive guides |
|
||
| **bat** | Syntax highlighting for file viewing and guides |
|
||
|
||
---
|
||
|
||
## How It Works
|
||
|
||
### Data Flow
|
||
|
||
```bash
|
||
1. User defines infrastructure in Nickel schemas
|
||
↓
|
||
2. Nickel evaluates with type validation and lazy evaluation
|
||
↓
|
||
3. CLI loads configuration (hierarchical merging)
|
||
↓
|
||
4. Configuration validated against provider schemas
|
||
↓
|
||
5. Workflow created with operations
|
||
↓
|
||
6. Orchestrator receives workflow
|
||
↓
|
||
7. Dependencies resolved (topological sort)
|
||
↓
|
||
8. Operations executed in order (parallel where possible)
|
||
↓
|
||
9. Providers handle cloud operations
|
||
↓
|
||
10. Task services installed on servers
|
||
↓
|
||
11. State persisted and monitored
|
||
```
|
||
|
||
### Example Workflow: Deploy Kubernetes Cluster
|
||
|
||
**Step 1**: Define infrastructure in Nickel
|
||
|
||
```nickel
|
||
# schemas/my-cluster.ncl
|
||
{
|
||
metadata = {
|
||
name = "my-cluster"
|
||
provider = "upcloud"
|
||
environment = "production"
|
||
}
|
||
|
||
infrastructure = {
|
||
servers = [
|
||
{name = "control-01", plan = "medium", role = "control"}
|
||
{name = "worker-01", plan = "large", role = "worker"}
|
||
{name = "worker-02", plan = "large", role = "worker"}
|
||
]
|
||
}
|
||
|
||
services = {
|
||
taskservs = ["kubernetes", "cilium", "rook-ceph"]
|
||
}
|
||
}
|
||
```
|
||
|
||
**Step 2**: Submit to Provisioning
|
||
|
||
```bash
|
||
provisioning server create --infra my-cluster
|
||
```
|
||
|
||
**Step 3**: Provisioning executes workflow
|
||
|
||
```bash
|
||
1. Create workflow: "deploy-my-cluster"
|
||
2. Resolve dependencies:
|
||
- containerd (required by kubernetes)
|
||
- etcd (required by kubernetes)
|
||
- kubernetes (explicitly requested)
|
||
- cilium (explicitly requested, requires kubernetes)
|
||
- rook-ceph (explicitly requested, requires kubernetes)
|
||
|
||
3. Execution order:
|
||
a. Provision servers (parallel)
|
||
b. Install containerd on all nodes
|
||
c. Install etcd on control nodes
|
||
d. Install kubernetes control plane
|
||
e. Join worker nodes
|
||
f. Install Cilium CNI
|
||
g. Install Rook-Ceph storage
|
||
|
||
4. Checkpoint after each step
|
||
5. Monitor health checks
|
||
6. Report completion
|
||
```
|
||
|
||
**Step 4**: Verify deployment
|
||
|
||
```bash
|
||
provisioning cluster status my-cluster
|
||
```
|
||
|
||
### Configuration Hierarchy
|
||
|
||
Configuration values are resolved through a hierarchy:
|
||
|
||
```toml
|
||
1. System Defaults (provisioning/config/config.defaults.toml)
|
||
↓ (overridden by)
|
||
2. User Preferences (~/.config/provisioning/user_config.yaml)
|
||
↓ (overridden by)
|
||
3. Workspace Config (workspace/config/provisioning.yaml)
|
||
↓ (overridden by)
|
||
4. Infrastructure Config (workspace/infra/<name>/config.toml)
|
||
↓ (overridden by)
|
||
5. Environment Config (workspace/config/prod-defaults.toml)
|
||
↓ (overridden by)
|
||
6. Runtime Flags (--flag value)
|
||
```
|
||
|
||
**Example**:
|
||
|
||
```bash
|
||
# System default
|
||
[servers]
|
||
default_plan = "small"
|
||
|
||
# User preference
|
||
[servers]
|
||
default_plan = "medium" # Overrides system default
|
||
|
||
# Infrastructure config
|
||
[servers]
|
||
default_plan = "large" # Overrides user preference
|
||
|
||
# Runtime
|
||
provisioning server create --plan xlarge # Overrides everything
|
||
```
|
||
|
||
---
|
||
|
||
## Use Cases
|
||
|
||
### 1. **Multi-Cloud Kubernetes Deployment**
|
||
|
||
Deploy Kubernetes clusters across different cloud providers with identical configuration.
|
||
|
||
```yaml
|
||
# UpCloud cluster
|
||
provisioning cluster create k8s-prod --provider upcloud
|
||
|
||
# AWS cluster (same config)
|
||
provisioning cluster create k8s-prod --provider aws
|
||
```
|
||
|
||
### 2. **Development → Staging → Production Pipeline**
|
||
|
||
Manage multiple environments with workspace switching.
|
||
|
||
```bash
|
||
# Development
|
||
provisioning workspace switch dev
|
||
provisioning cluster create app-stack
|
||
|
||
# Staging (same config, different resources)
|
||
provisioning workspace switch staging
|
||
provisioning cluster create app-stack
|
||
|
||
# Production (HA, larger resources)
|
||
provisioning workspace switch prod
|
||
provisioning cluster create app-stack
|
||
```
|
||
|
||
### 3. **Infrastructure as Code Testing**
|
||
|
||
Test infrastructure changes before deploying to production.
|
||
|
||
```bash
|
||
# Test Kubernetes upgrade locally
|
||
provisioning test topology load kubernetes_3node |
|
||
test env cluster kubernetes --version 1.29.0
|
||
|
||
# Verify functionality
|
||
provisioning test env run <env-id>
|
||
|
||
# Cleanup
|
||
provisioning test env cleanup <env-id>
|
||
```
|
||
|
||
### 4. **Batch Multi-Region Deployment**
|
||
|
||
Deploy to multiple regions in parallel using Nickel batch workflows.
|
||
|
||
```nickel
|
||
# schemas/batch/multi-region.ncl
|
||
{
|
||
batch_workflow = {
|
||
operations = [
|
||
{
|
||
id = "eu-cluster"
|
||
type = "cluster"
|
||
region = "eu-west-1"
|
||
cluster = "app-stack"
|
||
}
|
||
{
|
||
id = "us-cluster"
|
||
type = "cluster"
|
||
region = "us-east-1"
|
||
cluster = "app-stack"
|
||
}
|
||
{
|
||
id = "asia-cluster"
|
||
type = "cluster"
|
||
region = "ap-south-1"
|
||
cluster = "app-stack"
|
||
}
|
||
]
|
||
parallel_limit = 3 # All at once
|
||
}
|
||
}
|
||
```
|
||
|
||
```bash
|
||
provisioning batch submit schemas/batch/multi-region.ncl
|
||
provisioning batch monitor <workflow-id>
|
||
```
|
||
|
||
### 5. **Automated Disaster Recovery**
|
||
|
||
Recreate infrastructure from configuration.
|
||
|
||
```toml
|
||
# Infrastructure destroyed
|
||
provisioning workspace switch prod
|
||
|
||
# Recreate from config
|
||
provisioning cluster create --infra backup-restore --wait
|
||
|
||
# All services restored with same configuration
|
||
```
|
||
|
||
### 6. **CI/CD Integration**
|
||
|
||
Automated testing and deployment pipelines.
|
||
|
||
```bash
|
||
# .gitlab-ci.yml
|
||
test-infrastructure:
|
||
script:
|
||
- provisioning test quick kubernetes
|
||
- provisioning test quick postgres
|
||
|
||
deploy-staging:
|
||
script:
|
||
- provisioning workspace switch staging
|
||
- provisioning cluster create app-stack --check
|
||
- provisioning cluster create app-stack --yes
|
||
|
||
deploy-production:
|
||
when: manual
|
||
script:
|
||
- provisioning workspace switch prod
|
||
- provisioning cluster create app-stack --yes
|
||
```
|
||
|
||
---
|
||
|
||
## Getting Started
|
||
|
||
### Quick Start
|
||
|
||
1. **Install Prerequisites**
|
||
|
||
```bash
|
||
# Install Nushell (0.109.0+)
|
||
brew install nushell # macOS
|
||
|
||
# Install Nickel (required for IaC)
|
||
brew install nickel # macOS or from source
|
||
|
||
# Install SOPS (optional, for encrypted secrets)
|
||
brew install sops
|
||
```
|
||
|
||
2. **Add CLI to PATH**
|
||
|
||
```bash
|
||
ln -sf "$(pwd)/provisioning/core/cli/provisioning" /usr/local/bin/provisioning
|
||
```
|
||
|
||
3. **Initialize Workspace**
|
||
|
||
```bash
|
||
provisioning workspace init my-project
|
||
cd my-project
|
||
```
|
||
|
||
3.5. **Generate Versions File** (Optional - for bash scripts)
|
||
|
||
```bash
|
||
provisioning setup versions
|
||
# Creates /provisioning/core/versions with all tool and provider versions
|
||
|
||
# Use in your deployment scripts
|
||
source /provisioning/core/versions
|
||
echo "Deploying with Nushell $NU_VERSION and AWS CLI $PROVIDER_AWS_VERSION"
|
||
```
|
||
|
||
4. **Define Infrastructure (Nickel)**
|
||
|
||
```bash
|
||
# Create workspace infrastructure schema
|
||
cat > workspace/infra/my-cluster.ncl <<'EOF'
|
||
{
|
||
metadata.name = "my-cluster"
|
||
metadata.provider = "upcloud"
|
||
|
||
infrastructure.servers = [
|
||
{name = "control-01", plan = "medium"}
|
||
{name = "worker-01", plan = "large"}
|
||
]
|
||
|
||
services.taskservs = ["kubernetes", "cilium"]
|
||
}
|
||
EOF
|
||
```
|
||
|
||
5. **Deploy Infrastructure**
|
||
|
||
```bash
|
||
# Validate configuration
|
||
provisioning config validate
|
||
|
||
# Check what will be created
|
||
provisioning server create --check
|
||
|
||
# Create servers
|
||
provisioning server create --yes
|
||
|
||
# Install Kubernetes
|
||
provisioning taskserv create kubernetes
|
||
```
|
||
|
||
### Learning Path
|
||
|
||
1. **Start with Guides**
|
||
|
||
```bash
|
||
provisioning sc # Quick reference
|
||
provisioning guide from-scratch # Complete walkthrough
|
||
```
|
||
|
||
2. **Explore Examples**
|
||
|
||
```bash
|
||
ls provisioning/examples/
|
||
```
|
||
|
||
3. **Read Architecture Docs**
|
||
- [Core Engine](provisioning/core/README.md)
|
||
- [CLI Architecture](.claude/features/cli-architecture.md)
|
||
- [Configuration System](.claude/features/configuration-system.md)
|
||
- [Batch Workflows](.claude/features/batch-workflow-system.md)
|
||
|
||
4. **Try Test Environments**
|
||
|
||
```bash
|
||
provisioning test quick kubernetes
|
||
provisioning test quick postgres
|
||
```
|
||
|
||
5. **Build Custom Extensions**
|
||
- Create custom task services
|
||
- Define cluster templates
|
||
- Write workflow automation
|
||
|
||
---
|
||
|
||
## Documentation Index
|
||
|
||
### User & Operations Guides
|
||
|
||
See **[provisioning/docs/src/](provisioning/docs/src/)** for comprehensive documentation:
|
||
|
||
- **Quick Start** - Get started in 10 minutes
|
||
- **Command Reference** - Complete CLI command reference
|
||
- **Nickel Configuration Guide** - IaC language and patterns
|
||
- **Workspace Management** - Multi-workspace guide
|
||
- **Test Environment Guide** - Testing infrastructure with containers
|
||
- **Plugin Integration** - Native Rust plugins (10-50x faster)
|
||
- **Security System** - Authentication, MFA, KMS, Cedar policies
|
||
- **Operations** - Deployment, monitoring, incident response
|
||
|
||
### Architecture & Design Decisions
|
||
|
||
See **[docs/src/architecture/](docs/src/architecture/)** for design patterns:
|
||
|
||
- **System Architecture** - Multi-layer design
|
||
- **ADRs (Architecture Decision Records)** - Major decisions including:
|
||
- ADR-011: Nickel Migration (from KCL)
|
||
- ADR-012: Nushell + Nickel plugin wrapper
|
||
- ADR-010: Configuration format strategy
|
||
- **Multi-Repo Strategy** - Repository organization
|
||
- **Integration Patterns** - How components interact
|
||
|
||
### Development Guidelines
|
||
|
||
- **[Repository Structure](docs/src/development/)** - Codebase organization
|
||
- **[Contributing Guide](CONTRIBUTING.md)** - How to contribute
|
||
- **[Nushell Guidelines](.claude/guidelines/nushell/)** - Best practices
|
||
- **[Nickel Guidelines](.claude/guidelines/nickel.md)** - IaC patterns
|
||
- **[Rust Guidelines](.claude/guidelines/rust/)** - Rust conventions
|
||
|
||
### API Reference
|
||
|
||
- **REST API** - HTTP endpoints in `provisioning/docs/src/api-reference/`
|
||
- **Nushell API** - Library functions and modules
|
||
- **Provider API** - Cloud provider interface specification
|
||
|
||
---
|
||
|
||
## Project Status
|
||
|
||
**Current Version**: v5.0.0-nickel (Production Ready) | **Date**: 2026-01-08
|
||
|
||
### Completed Milestones
|
||
|
||
- ✅ **v5.0.0** (2026-01-08) - **Nickel IaC Migration Complete**
|
||
- Full KCL→Nickel migration
|
||
- Schema-driven configuration system
|
||
- Type-safe lazy evaluation
|
||
- ~220 legacy files removed, ~250 new schema files added
|
||
|
||
- ✅ **v3.6.0** (2026-01-08) - Version Management System
|
||
- Centralized tool and provider version management
|
||
- Bash-compatible versions file generation
|
||
- `provisioning setup versions` command
|
||
- Automatic provider version discovery from Nickel schemas
|
||
- Shell script integration with sourcing support
|
||
|
||
- ✅ **v4.0.0** (2025-10-09) - Complete Security System (12 components, 39,699 lines)
|
||
- ✅ **v3.5.0** (2025-10-07) - Platform Installer with TUI and CI/CD modes
|
||
- ✅ **v3.4.0** (2025-10-06) - Test Environment Service with container management
|
||
- ✅ **v3.3.0** (2025-09-30) - Interactive Guides system
|
||
- ✅ **v3.2.0** (2025-09-30) - Modular CLI Architecture (84% code reduction)
|
||
- ✅ **v3.1.0** (2025-09-25) - Batch Workflow System (85-90% token efficiency)
|
||
- ✅ **v3.0.0** (2025-09-25) - Hybrid Orchestrator (Rust/Nushell)
|
||
- ✅ **v2.0.5** (2025-10-02) - Workspace Switching system
|
||
- ✅ **v2.0.0** (2025-09-23) - Configuration System (476+ accessors)
|
||
- ✅ **v1.0.0** (2025-10-09) - Nushell Plugins Integration (10-50x performance)
|
||
|
||
### Current Focus
|
||
|
||
- **Nickel Ecosystem** - IDE support, LSP integration, schema libraries
|
||
- **Platform Consolidation** - GitHub Actions CI/CD, cross-platform testing
|
||
- **Extension Registry** - OCI-based distribution for task services and providers
|
||
- **Documentation** - Complete Nickel migration guides, ADR updates
|
||
|
||
---
|
||
|
||
## Support and Community
|
||
|
||
### Getting Help
|
||
|
||
- **Documentation**: Start with `provisioning help` or `provisioning guide from-scratch`
|
||
- **Issues**: Report bugs and request features on the issue tracker
|
||
- **Discussions**: Join community discussions for questions and ideas
|
||
|
||
### Contributing
|
||
|
||
Contributions are welcome! See [CONTRIBUTING.md](docs/development/CONTRIBUTING.md) for guidelines.
|
||
|
||
**Key areas for contribution**:
|
||
|
||
- New task service definitions
|
||
- Cloud provider implementations
|
||
- Cluster templates
|
||
- Documentation improvements
|
||
- Bug fixes and testing
|
||
|
||
---
|
||
|
||
## License
|
||
|
||
See [LICENSE](LICENSE) file in project root.
|
||
|
||
---
|
||
|
||
**Maintained By**: Architecture Team
|
||
**Last Updated**: 2026-05-12
|
||
**Current Branch**: nickel
|
||
**Project Home**: [provisioning/](provisioning/)
|
||
|
||
---
|
||
|
||
## Recent Changes (2026-05-12)
|
||
|
||
### ADR-025: Eager function-body parse + transitional fallback (2026-04-17)
|
||
|
||
**Architectural finding with lasting constraints**:
|
||
|
||
- Nushell parses `use` statements inside function bodies at **module-load time**, not call time. Subprocess boundary is the only true lazy-load mechanism.
|
||
- `provisioning-cli.nu` single-entry point tested and rejected for hot paths: 3.1s vs 0.08–0.15s
|
||
with thin handlers. All 15 dispatcher wrappers fire at module-load regardless of which command runs.
|
||
- 22 unmapped commands documented as migration debt (`universal-fallback-is-transitional` constraint).
|
||
- `bash-wrapper-has-no-runner-reference` amended to permit `provisioning-cli.nu` fallback during migration.
|
||
|
||
### Platform Services Documentation (2026-02-03)
|
||
|
||
**All 10 platform services fully documented**:
|
||
|
||
- Services: vault, registry, control-center, rag, ai, mcp, daemon, orchestrator, detector, ui
|
||
- 50+ REST endpoints catalogued with method, path, auth requirements
|
||
- `start-local-binaries.nu` — automation script with dependency resolution for local development
|
||
- Local Services Setup Guide added to `docs/src/operations/`
|
||
|
||
### TypeDialog Migration (2026-01-09)
|
||
|
||
**`forminquire` fully replaced by `typedialog`**:
|
||
|
||
- All interactive forms migrated to TOML-driven `typedialog` schema
|
||
- TTY wrapper scripts for terminal-safe form rendering
|
||
- Core forms: `auth-login`, `mfa-enroll`, `setup-wizard`
|
||
- Infrastructure delete confirmations: server, cluster, taskserv, generic
|
||
- Platform forms: ai-service, control-center, extension-registry with Nickel fragment composition
|
||
|
||
### Nushell 0.110.0 Compatibility (2026-01-21)
|
||
|
||
- Fixed `try-catch` syntax across bootstrap, scripts, and typedialog Nu scripts
|
||
- Reviewed and updated: `export-toml.nu`, `external.nu`, `paths.nu`, `configure.nu`
|
||
- Removed obsolete `.coder/` session reports; documentation structure reorganized
|
||
- Config loading architecture document added: `docs/src/architecture/config-loading-architecture.md`
|