22 KiB
Ops/DevOps Portfolio: Modern Infrastructure End-to-End
The Problem
DevOps and platform teams face critical challenges managing modern infrastructure:
- Fragmented tools: Terraform for IaC, Ansible for configuration, Vault for secrets, all disconnected
- Untyped YAML: Configuration errors that explode at runtime, not at compile time
- Static cryptography: No preparation for future quantum threats
- Manual orchestration: Fragile imperative scripts without rollback or recovery
- Hidden costs: No visibility into LLM spending for infrastructure generation
- Complex multi-cloud: Different APIs, configurations and tools per provider
The Solution: An Integrated Ecosystem
Five projects designed to work together, covering the complete operations cycle.
Provisioning: Declarative Infrastructure as Code
Typed IaC with AI-Assisted Generation
Provisioning combines the precision of typed configuration (Nickel) with AI-assisted generation, eliminating fragile YAML and imperative scripts.
Unique capabilities:
- Nickel IaC: Typed configuration with lazy evaluation, pre-runtime validation
- MCP Server: Natural language queries about infrastructure
- Integrated RAG: 1,200+ domain documents for contextual responses
- Multi-cloud: AWS, UpCloud, local (LXD) from the same definition
Hybrid orchestration:
- Rust orchestrator for critical workflows (10-50x performance vs Python)
- Nushell scripts for flexibility and rapid prototyping
- Automatic dependency resolution (topological sorting)
- Checkpoints and automatic rollback on failures
The workflow:
"I need a K8s cluster on AWS with 3 nodes and Cilium"
↓
MCP Server (NLP)
↓
RAG searches similar configurations
↓
Generates Nickel + validates types
↓
Orchestrator deploys:
1. containerd (dependency)
2. etcd (dependency)
3. kubernetes (core)
4. cilium (CNI)
With checkpoints and automatic rollback
Enterprise security:
- JWT + MFA (TOTP + WebAuthn)
- Cedar policy engine for RBAC/ABAC
- 7 years audit log retention
- 5 KMS backends (RustyVault, Age, AWS KMS, Vault, Cosmian)
- SOPS/Age for configuration encryption at rest
For whom:
- DevOps teams wanting typed IaC, not fragile YAML
- Multi-cloud organizations (AWS + UpCloud + on-premise)
- Teams needing audit, compliance and enterprise security
Expected results:
- Configuration errors detected at compile time, not at runtime
- Infrastructure generated from natural language (MCP + RAG)
- Automatic rollback on failures with state management
SecretumVault: Secrets Management with Post-Quantum Crypto
Rust Vault with PQC in Production
SecretumVault is a secrets management system that implements production-ready post-quantum cryptography (ML-KEM-768, ML-DSA-65), providing cryptographic agility for organizations deploying today.
Crypto-agnostic:
- OpenSSL: RSA, ECDSA, AES-256-GCM (classical compatibility)
- OQS (Post-Quantum): ML-KEM-768, ML-DSA-65 (NIST FIPS 203/204)
- AWS-LC: Experimental PQC (testing)
- RustCrypto: Pure-Rust implementations (testing)
- Pluggable backends: Change algorithms without modifying code
Secrets engines:
| Engine | Capability | Use cases |
|---|---|---|
| KV | Versioned secret storage | Credentials, API keys, sensitive configurations |
| Transit | Encryption-as-a-service with key rotation | Application data encryption, key rotation |
| PKI | X.509 certificate generation | mTLS, service mesh, internal infrastructure |
| Database | Dynamic credentials with TTL | PostgreSQL, MySQL, MongoDB credentials on-demand |
Multi-backend storage:
- Filesystem: Development, single-node, rapid prototyping
- etcd: Kubernetes, high availability, strong consistency
- SurrealDB: Complex queries, time-series, multi-tenant scopes
- PostgreSQL: Enterprise, ACID, complete auditing
Enterprise security:
- Shamir Secret Sharing for unsealing (configurable threshold)
- Cedar policy engine (ABAC, AWS-compatible)
- Native TLS/mTLS with X.509 certificates
- Complete audit logging with configurable retention
- Token management with TTL and renewal
Ops/DevOps workflow:
# Initialize vault with Shamir (5 shares, threshold 3)
svault operator init --shares 5 --threshold 3
# Unseal with 3 shares
svault operator unseal --share <share-1>
svault operator unseal --share <share-2>
svault operator unseal --share <share-3>
# Enable Database engine for PostgreSQL
svault secret engine enable database
svault secret database config postgres-prod \
plugin_name=postgresql-database-plugin \
connection_url="postgresql://{{username}}:{{password}}@postgres:5432/mydb" \
username="vault" password="vaultpass"
# Create role for dynamic credentials
svault secret database role create myapp-role \
db_name=postgres-prod \
creation_statements="CREATE USER '{{name}}' WITH PASSWORD '{{password}}' VALID UNTIL '{{expiration}}'; GRANT SELECT ON ALL TABLES IN SCHEMA public TO '{{name}}';" \
default_ttl=1h max_ttl=24h
# Get dynamic credentials (generated on-demand)
svault secret read database/creds/myapp-role
# Key Value
# --- -----
# lease_id database/creds/myapp-role/abc123
# lease_duration 3600
# username v-myapp-role-xyz789
# password A1b2C3d4E5f6G7h8
# Credentials are automatically revoked after 1h TTL
For whom:
- Teams deploying post-quantum cryptography today
- Organizations with cryptographic agility requirements
- Multi-cloud platforms needing Rust-native secrets management
- Security teams evaluating future quantum threats
Expected results:
- Preparation for quantum threats without changing architecture
- Secrets management with Rust memory guarantees
- Native integration with Provisioning (KMS) and Vapora (agent credentials)
Vapora: Agent Orchestration with Cost Control
Intelligent Agents for Operations
Vapora is not just for feature development. It's an orchestration platform that can coordinate specialized agents for DevOps operations.
Available agents for Ops:
- DevOps: CI/CD, pipelines, deployment automation
- Monitor: Health checks, alerting, real-time metrics
- Security: Auditing, compliance, vulnerability scanning
- ProjectManager: Roadmap, tracking, task coordination
Real cost control for LLMs:
- Budgets per role (monthly/weekly)
- Three levels: normal → near limit → exceeded
- Automatic fallback to cheaper providers without manual intervention
- Prometheus metrics:
vapora_budget_utilization,vapora_fallback_triggers
NATS JetStream coordination:
┌──────────────────────────────────────────────────────┐
│ NATS JetStream Messaging │
├──────────────────────────────────────────────────────┤
│ │
│ vapora.tasks.assign → Task assignment │
│ vapora.tasks.results → Execution results │
│ vapora.agents.heartbeat → Agent health check │
│ │
│ Persistence: JetStream streams │
│ Delivery: At-least-once with acknowledgment │
│ Ordering: Per-subject message ordering │
└──────────────────────────────────────────────────────┘
Ops pipeline orchestration:
Pipeline: "Deploy microservice to K8s"
1. Security Agent: Docker image vulnerability scan
2. DevOps Agent: Validate K8s manifests + Helm charts
3. Monitor Agent: Setup Prometheus metrics + alerts
4. DevOps Agent: Deploy with kubectl apply + health check
5. Monitor Agent: Validate health endpoints + smoke tests
If any step fails: coordinated automatic rollback
Metrics and observability:
- Prometheus metrics endpoint (
/metrics) - OpenTelemetry integration (traces, spans)
- SurrealDB for execution storage
- Grafana dashboards for visualization
For whom:
- DevOps teams coordinating multiple LLM agents for operations
- Organizations needing to control LLM spending in automation
- Platforms with complex pipelines (CI/CD, deployment, monitoring)
Expected results:
- LLM cost reduction through intelligent routing
- Automatic orchestration of complex operational tasks
- Complete visibility of spending and performance per agent
TypeDialog: Multi-Backend Forms for Configuration
One Definition, Six Interfaces (Includes prov-gen)
TypeDialog unifies configuration capture in CLI, TUI, Web, and has a specialized backend for multi-cloud IaC generation.
Operational backends:
| Backend | Typical Ops/DevOps use |
|---|---|
| CLI | Automation scripts, CI/CD pipelines |
| TUI | Admin tools, terminal dashboards |
| Web | Self-service portals, team forms |
| Prov-gen | Multi-cloud infrastructure generation |
Prov-gen Backend: IaC Generation
The prov-gen backend generates Nickel infrastructure configurations for multiple clouds from typed forms:
# cluster-setup.toml
[form]
id = "k8s_cluster"
title = "Kubernetes Cluster Setup"
[[sections]]
id = "cloud"
title = "Cloud Provider"
[[sections.fields]]
id = "provider"
type = "select"
label = "Provider"
required = true
options = [
{ value = "aws", label = "AWS" },
{ value = "upcloud", label = "UpCloud" },
{ value = "local", label = "Local LXD" },
]
[[sections.fields]]
id = "region"
type = "text"
label = "Region"
required = true
[[sections]]
id = "cluster"
title = "Cluster Configuration"
[[sections.fields]]
id = "node_count"
type = "number"
label = "Node Count"
default = 3
validation.min = 1
validation.max = 20
[[sections.fields]]
id = "node_size"
type = "select"
label = "Node Size"
options = [
{ value = "small", label = "Small (2 CPU, 4GB RAM)" },
{ value = "medium", label = "Medium (4 CPU, 8GB RAM)" },
{ value = "large", label = "Large (8 CPU, 16GB RAM)" },
]
[output]
backend = "prov-gen"
format = "nickel"
validation = "nickel://schemas/kubernetes_cluster.ncl"
Execute with prov-gen:
typedialog execute cluster-setup.toml --backend prov-gen --output k8s-cluster.ncl
Generates Nickel IaC:
# k8s-cluster.ncl (automatically generated)
{
provider = "aws",
region = "us-east-1",
servers = [
{
name = "k8s-control-plane-01",
plan = "medium",
role = "control-plane",
provider = "aws",
},
{
name = "k8s-worker-01",
plan = "medium",
role = "worker",
provider = "aws",
},
{
name = "k8s-worker-02",
plan = "medium",
role = "worker",
provider = "aws",
},
],
taskservs = [
"containerd",
"etcd",
"kubernetes",
"cilium",
],
networking = {
vpc_cidr = "10.0.0.0/16",
pod_cidr = "10.244.0.0/16",
service_cidr = "10.96.0.0/12",
},
}
Nickel contracts validation:
// Automatic validation with Nickel schemas
let validator = NickelValidator::new();
let result = validator.validate(&generated_iac, "schemas/kubernetes_cluster.ncl")?;
if result.errors.is_empty() {
// Valid IaC, ready for Provisioning
provisioning_client.apply(&generated_iac).await?;
} else {
// Validation errors, show to user
eprintln!("Validation errors: {:?}", result.errors);
}
For whom:
- DevOps teams maintaining configuration wizards in CLI and Web
- Organizations with self-service infrastructure portals
- Teams needing IaC generation from forms
Expected results:
- One TOML definition for CLI, TUI, Web and IaC generation
- Typed validation before runtime with Nickel contracts
- Reduction of manual configuration errors
Kogral: Knowledge Base for Platform Teams
Your Ops Knowledge Base, Queryable
Kogral captures architectural decisions, runbooks, postmortems and operational procedures in a format that both humans and AI agents can query.
6 specialized node types for Ops:
| Type | Ops/DevOps use |
|---|---|
| Note | Runbooks, procedures, troubleshooting guides |
| Decision | Infrastructure ADRs (why AWS vs UpCloud, etcd vs Consul) |
| Guideline | Deployment standards, security policies |
| Pattern | Reusable infrastructure patterns (multi-AZ, HA) |
| Journal | Change logs, daily stand-up notes |
| Execution | Deployment history, rollbacks, incidents |
Git-native + MCP for Claude Code:
- Everything in versioned markdown (
.kogral/directory) - MCP server for Claude Code: agents query runbooks before executing
- Semantic search with fastembed (local) or cloud embeddings
The Ops flow:
Production incident → Capture postmortem in Kogral as Execution
↓
Claude Code queries via MCP → "How did we resolve this error before?"
↓
Kogral responds with similar postmortems + runbooks
↓
Agent applies documented solution instead of guessing
MCP Tools for Ops:
# Search troubleshooting runbooks
kogral-mcp search "nginx 502 error troubleshooting"
# Add incident postmortem
kogral-mcp add-execution \
--title "2026-01-22 PostgreSQL Connection Pool Exhaustion" \
--context "Production database connections maxed out" \
--resolution "Increased max_connections from 100 to 200, added PgBouncer" \
--tags "database,incident,postgresql"
# Get deployment guidelines
kogral-mcp get-guidelines "kubernetes deployment" --include-shared true
For whom:
- Platform teams needing to preserve operational knowledge
- SRE teams with rotation losing context of previous incidents
- DevOps using Claude Code wanting contextualized runbooks
Expected results:
- New SRE onboarding in days, not weeks
- Incident resolution informed by previous postmortems
- Infrastructure decisions preserved and searchable
The Ecosystem in Action: Ops Scenarios
Scenario 1: New Multi-Cloud Kubernetes Cluster
1. TypeDialog (prov-gen): Configuration wizard for cluster
- Cloud provider, region, node count, node size
- Generates validated Nickel IaC
2. Provisioning: Deploys infrastructure
- Creates servers on AWS/UpCloud
- Installs containerd, etcd, kubernetes, cilium
- Checkpoints per step, automatic rollback if fails
3. SecretumVault: Generates PKI certificates
- Certificates for etcd, kube-apiserver, kubelet
- Automatic rotation every 90 days
4. Kogral: Documents architecture decision
- ADR: "Why Cilium over Calico"
- Runbook: "How to scale cluster from 3 to 10 nodes"
5. Vapora: Orchestrates post-deployment
- Monitor Agent: Setup Prometheus + Grafana
- Security Agent: Vulnerability scanning
- DevOps Agent: Deploy test applications
Scenario 2: Production Incident (Database Outage)
1. Vapora Monitor Agent: Detects PostgreSQL down
- Alert via NATS JetStream
- Trigger incident response pipeline
2. Kogral: Claude Code queries via MCP
- "PostgreSQL outage postmortems?"
- Returns 3 similar incidents with resolutions
3. Vapora DevOps Agent: Executes runbook
- Restarts PostgreSQL with adjusted parameters
- Verifies health checks
4. SecretumVault: Rotates DB credentials
- Generates new dynamic credentials
- Updates applications via Database engine
5. Kogral: Documents postmortem
- Execution node with root cause, resolution, action items
- Linked to PostgreSQL configuration ADRs
Scenario 3: Post-Quantum Cryptography Migration
1. Kogral: Documents migration decision
- ADR: "Migration to ML-KEM-768 for quantum threat preparation"
- Timeline, risks, mitigation strategies
2. SecretumVault: Migrates secrets
- Backend change: openssl → oqs
- Re-encrypts secrets with ML-KEM-768
- Maintains compatibility with classical clients
3. Provisioning: Updates infrastructure
- Generates new PKI certificates with ML-DSA-65
- Deploys certificates to services (etcd, K8s API)
- Automatic rollback if health checks fail
4. Vapora: Orchestrates validation
- Security Agent: Verifies correct cryptography
- Monitor Agent: Validates latency not degraded
- DevOps Agent: Executes integration tests
5. TypeDialog: Self-service portal for teams
- Form: "Migrate service to PQC"
- prov-gen backend generates updated configuration
Scenario 4: CI/CD with AI Validation
1. Developer: Push to Git repository (Gitea)
2. Vapora DevOps Agent (trigger via webhook):
- Executes linting, unit tests
- Build Docker image
- Vulnerability scan with Security Agent
3. TypeDialog: Deployment form
- Environment (staging/production)
- Canary rollout percentage
- Generates validated K8s configuration
4. Provisioning: Deploys with Tekton
- Apply K8s manifests with kubectl
- Automatic health checks
- Rollback if health check fails
5. SecretumVault: Injects secrets
- Dynamic DB credentials (TTL 1h)
- API keys from KV engine
- TLS certificates from PKI engine
6. Kogral: Records deployment
- Execution node with version, timestamp, author
- Link to commit SHA, PR, changes
Why Choose This Ecosystem (Ops Perspective)
Versus Alternatives
| Us | Terraform + Ansible + Vault |
|---|---|
| Typed configuration: Nickel with pre-runtime validation | YAML/HCL without types, errors at runtime |
| Integrated orchestration: Provisioning orchestrator with rollback | Imperative scripts, no automatic recovery |
| Post-Quantum crypto: SecretumVault with ML-KEM/ML-DSA today | Vault without PQC roadmap |
| Unified multi-cloud: One Nickel configuration for AWS/UpCloud/Local | Separate configurations per cloud |
| AI-native: MCP + RAG for assisted generation | No AI assistance, manual configuration |
| Full Rust stack: Performance, memory-safety | Mixed Python/Go/Shell with overhead |
Technical Investment (Ops Focus)
| Metric | Value |
|---|---|
| Provisioning: Nickel IaC, 80+ CLI shortcuts | ~40K LOC |
| SecretumVault: 4 crypto backends, 4 storage backends | ~11K LOC |
| Vapora: NATS JetStream, 12 agent roles | ~50K LOC |
| TypeDialog: 6 backends including prov-gen | ~90K LOC |
| Kogral: 6 node types, MCP server | ~15K LOC |
| Total tests | 4,360+ |
| Crypto backends | OpenSSL, OQS (PQC), AWS-LC, RustCrypto |
| Storage backends | FS, etcd, SurrealDB, PostgreSQL |
Getting Started (Adoption for Ops Teams)
Recommended Progressive Adoption
- SecretumVault: Secrets management with cryptographic agility (standalone)
- Kogral: Establish operational knowledge base (runbooks, ADRs, postmortems)
- TypeDialog: Configuration wizards for teams (CLI + Web + prov-gen)
- Provisioning: Multi-cloud declarative IaC with orchestrator
- Vapora: Orchestrate Ops agents with budget control (DevOps, Monitor, Security)
Each project works independently. Synergies emerge when combining them.
Quick Start per Project
SecretumVault:
# Docker Compose with etcd
docker-compose -f deploy/docker/docker-compose.yml up -d
# Initialize vault
curl -X POST http://localhost:8200/v1/sys/init -d '{"shares": 5, "threshold": 3}'
# Unseal with 3 shares
curl -X POST http://localhost:8200/v1/sys/unseal -d '{"key": "<share-1>"}'
curl -X POST http://localhost:8200/v1/sys/unseal -d '{"key": "<share-2>"}'
curl -X POST http://localhost:8200/v1/sys/unseal -d '{"key": "<share-3>"}'
# Enable PKI engine for certificates
svault secret engine enable pki
Kogral:
# Initialize knowledge repository
kogral init
# Add runbook
kogral add note "PostgreSQL Connection Pool Tuning" \
--tags "database,postgresql,performance"
# Add ADR
kogral add decision "Choose Cilium over Calico" \
--context "Need CNI for K8s with eBPF" \
--decision "Cilium for performance and observability" \
--consequences "Higher initial complexity, better long-term performance"
# Serve MCP server for Claude Code
kogral serve --port 3100
Provisioning:
# Clone repository
git clone https://repo.jesusperez.pro/jesus/provisioning
cd provisioning
# Configure provider (UpCloud in this example)
cp config/providers/upcloud.example.toml config/providers/upcloud.toml
# Edit with UpCloud credentials
# Create K8s cluster (Nickel definition)
cat > cluster.ncl <<EOF
{
provider = "upcloud",
region = "de-fra1",
servers = [
{ name = "k8s-cp-01", plan = "medium", role = "control-plane" },
{ name = "k8s-worker-01", plan = "medium", role = "worker" },
{ name = "k8s-worker-02", plan = "medium", role = "worker" },
],
taskservs = ["containerd", "etcd", "kubernetes", "cilium"],
}
EOF
# Validate configuration
nickel typecheck cluster.ncl
# Apply (orchestrator with checkpoints)
prov apply cluster.ncl --with-rollback
TypeDialog (prov-gen):
# Execute cluster configuration wizard
typedialog execute examples/ops/cluster-setup.toml \
--backend prov-gen \
--output my-cluster.ncl
# Generated configuration ready for Provisioning
nickel typecheck my-cluster.ncl
prov apply my-cluster.ncl
Vapora:
# Deploy with Docker Compose (backend + NATS + SurrealDB)
docker-compose up -d
# Create project
curl -X POST http://localhost:8001/projects \
-H "Content-Type: application/json" \
-d '{"name": "Infrastructure Automation", "description": "DevOps pipelines"}'
# Create task for DevOps Agent
curl -X POST http://localhost:8001/tasks \
-H "Content-Type: application/json" \
-d '{
"title": "Deploy Prometheus to K8s",
"task_type": "deployment",
"context": {"cluster": "prod-us-east-1", "namespace": "monitoring"}
}'
# Assign to DevOps Agent
curl -X POST http://localhost:8001/tasks/<task-id>/assign \
-H "Content-Type: application/json" \
-d '{"agent_role": "DevOps"}'
Contact
- Repositories: GitHub (private projects)
- Stack: Rust, Nickel, Nushell, SurrealDB, Axum
- License: Proprietary / To be defined
Modern infrastructure shouldn't require 10 disconnected tools. One ecosystem. Five projects. Real integration for Ops/DevOps.