# Ops/DevOps Portfolio: Modern Infrastructure End-to-End ## The Problem DevOps and platform teams face critical challenges managing modern infrastructure: - **Fragmented tools**: Terraform for IaC, Ansible for configuration, Vault for secrets, all disconnected - **Untyped YAML**: Configuration errors that explode at runtime, not at compile time - **Static cryptography**: No preparation for future quantum threats - **Manual orchestration**: Fragile imperative scripts without rollback or recovery - **Hidden costs**: No visibility into LLM spending for infrastructure generation - **Complex multi-cloud**: Different APIs, configurations and tools per provider ## The Solution: An Integrated Ecosystem Five projects designed to work together, covering the complete operations cycle. --- ## Provisioning: Declarative Infrastructure as Code ### Typed IaC with AI-Assisted Generation Provisioning combines the precision of typed configuration (Nickel) with AI-assisted generation, eliminating fragile YAML and imperative scripts. **Unique capabilities**: - **Nickel IaC**: Typed configuration with lazy evaluation, pre-runtime validation - **MCP Server**: Natural language queries about infrastructure - **Integrated RAG**: 1,200+ domain documents for contextual responses - **Multi-cloud**: AWS, UpCloud, local (LXD) from the same definition **Hybrid orchestration**: - Rust orchestrator for critical workflows (10-50x performance vs Python) - Nushell scripts for flexibility and rapid prototyping - Automatic dependency resolution (topological sorting) - Checkpoints and automatic rollback on failures **The workflow**: ```text "I need a K8s cluster on AWS with 3 nodes and Cilium" ↓ MCP Server (NLP) ↓ RAG searches similar configurations ↓ Generates Nickel + validates types ↓ Orchestrator deploys: 1. containerd (dependency) 2. etcd (dependency) 3. kubernetes (core) 4. cilium (CNI) With checkpoints and automatic rollback ``` **Enterprise security**: - JWT + MFA (TOTP + WebAuthn) - Cedar policy engine for RBAC/ABAC - 7 years audit log retention - 5 KMS backends (RustyVault, Age, AWS KMS, Vault, Cosmian) - SOPS/Age for configuration encryption at rest **For whom**: - DevOps teams wanting typed IaC, not fragile YAML - Multi-cloud organizations (AWS + UpCloud + on-premise) - Teams needing audit, compliance and enterprise security **Expected results**: - Configuration errors detected at compile time, not at runtime - Infrastructure generated from natural language (MCP + RAG) - Automatic rollback on failures with state management --- ## SecretumVault: Secrets Management with Post-Quantum Crypto ### Rust Vault with PQC in Production SecretumVault is a secrets management system that implements **production-ready post-quantum cryptography** (ML-KEM-768, ML-DSA-65), providing cryptographic agility for organizations deploying today. **Crypto-agnostic**: - **OpenSSL**: RSA, ECDSA, AES-256-GCM (classical compatibility) - **OQS (Post-Quantum)**: ML-KEM-768, ML-DSA-65 (NIST FIPS 203/204) - **AWS-LC**: Experimental PQC (testing) - **RustCrypto**: Pure-Rust implementations (testing) - **Pluggable backends**: Change algorithms without modifying code **Secrets engines**: | Engine | Capability | Use cases | | -------- | ------------ | ----------- | | **KV** | Versioned secret storage | Credentials, API keys, sensitive configurations | | **Transit** | Encryption-as-a-service with key rotation | Application data encryption, key rotation | | **PKI** | X.509 certificate generation | mTLS, service mesh, internal infrastructure | | **Database** | Dynamic credentials with TTL | PostgreSQL, MySQL, MongoDB credentials on-demand | **Multi-backend storage**: - **Filesystem**: Development, single-node, rapid prototyping - **etcd**: Kubernetes, high availability, strong consistency - **SurrealDB**: Complex queries, time-series, multi-tenant scopes - **PostgreSQL**: Enterprise, ACID, complete auditing **Enterprise security**: - Shamir Secret Sharing for unsealing (configurable threshold) - Cedar policy engine (ABAC, AWS-compatible) - Native TLS/mTLS with X.509 certificates - Complete audit logging with configurable retention - Token management with TTL and renewal **Ops/DevOps workflow**: ```bash # Initialize vault with Shamir (5 shares, threshold 3) svault operator init --shares 5 --threshold 3 # Unseal with 3 shares svault operator unseal --share svault operator unseal --share svault operator unseal --share # Enable Database engine for PostgreSQL svault secret engine enable database svault secret database config postgres-prod \ plugin_name=postgresql-database-plugin \ connection_url="postgresql://{{username}}:{{password}}@postgres:5432/mydb" \ username="vault" password="vaultpass" # Create role for dynamic credentials svault secret database role create myapp-role \ db_name=postgres-prod \ creation_statements="CREATE USER '{{name}}' WITH PASSWORD '{{password}}' VALID UNTIL '{{expiration}}'; GRANT SELECT ON ALL TABLES IN SCHEMA public TO '{{name}}';" \ default_ttl=1h max_ttl=24h # Get dynamic credentials (generated on-demand) svault secret read database/creds/myapp-role # Key Value # --- ----- # lease_id database/creds/myapp-role/abc123 # lease_duration 3600 # username v-myapp-role-xyz789 # password A1b2C3d4E5f6G7h8 # Credentials are automatically revoked after 1h TTL ``` **For whom**: - Teams deploying post-quantum cryptography today - Organizations with cryptographic agility requirements - Multi-cloud platforms needing Rust-native secrets management - Security teams evaluating future quantum threats **Expected results**: - Preparation for quantum threats without changing architecture - Secrets management with Rust memory guarantees - Native integration with Provisioning (KMS) and Vapora (agent credentials) --- ## Vapora: Agent Orchestration with Cost Control ### Intelligent Agents for Operations Vapora is not just for feature development. It's an orchestration platform that can coordinate specialized agents for DevOps operations. **Available agents for Ops**: - **DevOps**: CI/CD, pipelines, deployment automation - **Monitor**: Health checks, alerting, real-time metrics - **Security**: Auditing, compliance, vulnerability scanning - **ProjectManager**: Roadmap, tracking, task coordination **Real cost control for LLMs**: - Budgets per role (monthly/weekly) - Three levels: normal → near limit → exceeded - Automatic fallback to cheaper providers without manual intervention - Prometheus metrics: `vapora_budget_utilization`, `vapora_fallback_triggers` **NATS JetStream coordination**: ```text ┌──────────────────────────────────────────────────────┐ │ NATS JetStream Messaging │ ├──────────────────────────────────────────────────────┤ │ │ │ vapora.tasks.assign → Task assignment │ │ vapora.tasks.results → Execution results │ │ vapora.agents.heartbeat → Agent health check │ │ │ │ Persistence: JetStream streams │ │ Delivery: At-least-once with acknowledgment │ │ Ordering: Per-subject message ordering │ └──────────────────────────────────────────────────────┘ ``` **Ops pipeline orchestration**: ```text Pipeline: "Deploy microservice to K8s" 1. Security Agent: Docker image vulnerability scan 2. DevOps Agent: Validate K8s manifests + Helm charts 3. Monitor Agent: Setup Prometheus metrics + alerts 4. DevOps Agent: Deploy with kubectl apply + health check 5. Monitor Agent: Validate health endpoints + smoke tests If any step fails: coordinated automatic rollback ``` **Metrics and observability**: - Prometheus metrics endpoint (`/metrics`) - OpenTelemetry integration (traces, spans) - SurrealDB for execution storage - Grafana dashboards for visualization **For whom**: - DevOps teams coordinating multiple LLM agents for operations - Organizations needing to control LLM spending in automation - Platforms with complex pipelines (CI/CD, deployment, monitoring) **Expected results**: - LLM cost reduction through intelligent routing - Automatic orchestration of complex operational tasks - Complete visibility of spending and performance per agent --- ## TypeDialog: Multi-Backend Forms for Configuration ### One Definition, Six Interfaces (Includes prov-gen) TypeDialog unifies configuration capture in CLI, TUI, Web, and has a specialized backend for multi-cloud IaC generation. **Operational backends**: | Backend | Typical Ops/DevOps use | | --------- | ------------------------ | | **CLI** | Automation scripts, CI/CD pipelines | | **TUI** | Admin tools, terminal dashboards | | **Web** | Self-service portals, team forms | | **Prov-gen** | **Multi-cloud infrastructure generation** | **Prov-gen Backend: IaC Generation** The `prov-gen` backend generates Nickel infrastructure configurations for multiple clouds from typed forms: ```toml # cluster-setup.toml [form] id = "k8s_cluster" title = "Kubernetes Cluster Setup" [[sections]] id = "cloud" title = "Cloud Provider" [[sections.fields]] id = "provider" type = "select" label = "Provider" required = true options = [ { value = "aws", label = "AWS" }, { value = "upcloud", label = "UpCloud" }, { value = "local", label = "Local LXD" }, ] [[sections.fields]] id = "region" type = "text" label = "Region" required = true [[sections]] id = "cluster" title = "Cluster Configuration" [[sections.fields]] id = "node_count" type = "number" label = "Node Count" default = 3 validation.min = 1 validation.max = 20 [[sections.fields]] id = "node_size" type = "select" label = "Node Size" options = [ { value = "small", label = "Small (2 CPU, 4GB RAM)" }, { value = "medium", label = "Medium (4 CPU, 8GB RAM)" }, { value = "large", label = "Large (8 CPU, 16GB RAM)" }, ] [output] backend = "prov-gen" format = "nickel" validation = "nickel://schemas/kubernetes_cluster.ncl" ``` Execute with prov-gen: ```bash typedialog execute cluster-setup.toml --backend prov-gen --output k8s-cluster.ncl ``` Generates Nickel IaC: ```nickel # k8s-cluster.ncl (automatically generated) { provider = "aws", region = "us-east-1", servers = [ { name = "k8s-control-plane-01", plan = "medium", role = "control-plane", provider = "aws", }, { name = "k8s-worker-01", plan = "medium", role = "worker", provider = "aws", }, { name = "k8s-worker-02", plan = "medium", role = "worker", provider = "aws", }, ], taskservs = [ "containerd", "etcd", "kubernetes", "cilium", ], networking = { vpc_cidr = "10.0.0.0/16", pod_cidr = "10.244.0.0/16", service_cidr = "10.96.0.0/12", }, } ``` **Nickel contracts validation**: ```rust // Automatic validation with Nickel schemas let validator = NickelValidator::new(); let result = validator.validate(&generated_iac, "schemas/kubernetes_cluster.ncl")?; if result.errors.is_empty() { // Valid IaC, ready for Provisioning provisioning_client.apply(&generated_iac).await?; } else { // Validation errors, show to user eprintln!("Validation errors: {:?}", result.errors); } ``` **For whom**: - DevOps teams maintaining configuration wizards in CLI and Web - Organizations with self-service infrastructure portals - Teams needing IaC generation from forms **Expected results**: - One TOML definition for CLI, TUI, Web and IaC generation - Typed validation before runtime with Nickel contracts - Reduction of manual configuration errors --- ## Kogral: Knowledge Base for Platform Teams ### Your Ops Knowledge Base, Queryable Kogral captures architectural decisions, runbooks, postmortems and operational procedures in a format that both humans and AI agents can query. **6 specialized node types for Ops**: | Type | Ops/DevOps use | | ------ | ---------------- | | **Note** | Runbooks, procedures, troubleshooting guides | | **Decision** | Infrastructure ADRs (why AWS vs UpCloud, etcd vs Consul) | | **Guideline** | Deployment standards, security policies | | **Pattern** | Reusable infrastructure patterns (multi-AZ, HA) | | **Journal** | Change logs, daily stand-up notes | | **Execution** | Deployment history, rollbacks, incidents | **Git-native + MCP for Claude Code**: - Everything in versioned markdown (`.kogral/` directory) - MCP server for Claude Code: agents query runbooks before executing - Semantic search with fastembed (local) or cloud embeddings **The Ops flow**: ```text Production incident → Capture postmortem in Kogral as Execution ↓ Claude Code queries via MCP → "How did we resolve this error before?" ↓ Kogral responds with similar postmortems + runbooks ↓ Agent applies documented solution instead of guessing ``` **MCP Tools for Ops**: ```bash # Search troubleshooting runbooks kogral-mcp search "nginx 502 error troubleshooting" # Add incident postmortem kogral-mcp add-execution \ --title "2026-01-22 PostgreSQL Connection Pool Exhaustion" \ --context "Production database connections maxed out" \ --resolution "Increased max_connections from 100 to 200, added PgBouncer" \ --tags "database,incident,postgresql" # Get deployment guidelines kogral-mcp get-guidelines "kubernetes deployment" --include-shared true ``` **For whom**: - Platform teams needing to preserve operational knowledge - SRE teams with rotation losing context of previous incidents - DevOps using Claude Code wanting contextualized runbooks **Expected results**: - New SRE onboarding in days, not weeks - Incident resolution informed by previous postmortems - Infrastructure decisions preserved and searchable --- ## The Ecosystem in Action: Ops Scenarios ### Scenario 1: New Multi-Cloud Kubernetes Cluster ```text 1. TypeDialog (prov-gen): Configuration wizard for cluster - Cloud provider, region, node count, node size - Generates validated Nickel IaC 2. Provisioning: Deploys infrastructure - Creates servers on AWS/UpCloud - Installs containerd, etcd, kubernetes, cilium - Checkpoints per step, automatic rollback if fails 3. SecretumVault: Generates PKI certificates - Certificates for etcd, kube-apiserver, kubelet - Automatic rotation every 90 days 4. Kogral: Documents architecture decision - ADR: "Why Cilium over Calico" - Runbook: "How to scale cluster from 3 to 10 nodes" 5. Vapora: Orchestrates post-deployment - Monitor Agent: Setup Prometheus + Grafana - Security Agent: Vulnerability scanning - DevOps Agent: Deploy test applications ``` ### Scenario 2: Production Incident (Database Outage) ```text 1. Vapora Monitor Agent: Detects PostgreSQL down - Alert via NATS JetStream - Trigger incident response pipeline 2. Kogral: Claude Code queries via MCP - "PostgreSQL outage postmortems?" - Returns 3 similar incidents with resolutions 3. Vapora DevOps Agent: Executes runbook - Restarts PostgreSQL with adjusted parameters - Verifies health checks 4. SecretumVault: Rotates DB credentials - Generates new dynamic credentials - Updates applications via Database engine 5. Kogral: Documents postmortem - Execution node with root cause, resolution, action items - Linked to PostgreSQL configuration ADRs ``` ### Scenario 3: Post-Quantum Cryptography Migration ```text 1. Kogral: Documents migration decision - ADR: "Migration to ML-KEM-768 for quantum threat preparation" - Timeline, risks, mitigation strategies 2. SecretumVault: Migrates secrets - Backend change: openssl → oqs - Re-encrypts secrets with ML-KEM-768 - Maintains compatibility with classical clients 3. Provisioning: Updates infrastructure - Generates new PKI certificates with ML-DSA-65 - Deploys certificates to services (etcd, K8s API) - Automatic rollback if health checks fail 4. Vapora: Orchestrates validation - Security Agent: Verifies correct cryptography - Monitor Agent: Validates latency not degraded - DevOps Agent: Executes integration tests 5. TypeDialog: Self-service portal for teams - Form: "Migrate service to PQC" - prov-gen backend generates updated configuration ``` ### Scenario 4: CI/CD with AI Validation ```text 1. Developer: Push to Git repository (Gitea) 2. Vapora DevOps Agent (trigger via webhook): - Executes linting, unit tests - Build Docker image - Vulnerability scan with Security Agent 3. TypeDialog: Deployment form - Environment (staging/production) - Canary rollout percentage - Generates validated K8s configuration 4. Provisioning: Deploys with Tekton - Apply K8s manifests with kubectl - Automatic health checks - Rollback if health check fails 5. SecretumVault: Injects secrets - Dynamic DB credentials (TTL 1h) - API keys from KV engine - TLS certificates from PKI engine 6. Kogral: Records deployment - Execution node with version, timestamp, author - Link to commit SHA, PR, changes ``` --- ## Why Choose This Ecosystem (Ops Perspective) ### Versus Alternatives | Us | Terraform + Ansible + Vault | | ---- | ----------------------------- | | **Typed configuration**: Nickel with pre-runtime validation | YAML/HCL without types, errors at runtime | | **Integrated orchestration**: Provisioning orchestrator with rollback | Imperative scripts, no automatic recovery | | **Post-Quantum crypto**: SecretumVault with ML-KEM/ML-DSA today | Vault without PQC roadmap | | **Unified multi-cloud**: One Nickel configuration for AWS/UpCloud/Local | Separate configurations per cloud | | **AI-native**: MCP + RAG for assisted generation | No AI assistance, manual configuration | | **Full Rust stack**: Performance, memory-safety | Mixed Python/Go/Shell with overhead | ### Technical Investment (Ops Focus) | Metric | Value | | -------- | ------- | | **Provisioning**: Nickel IaC, 80+ CLI shortcuts | ~40K LOC | | **SecretumVault**: 4 crypto backends, 4 storage backends | ~11K LOC | | **Vapora**: NATS JetStream, 12 agent roles | ~50K LOC | | **TypeDialog**: 6 backends including prov-gen | ~90K LOC | | **Kogral**: 6 node types, MCP server | ~15K LOC | | **Total tests** | 4,360+ | | **Crypto backends** | OpenSSL, OQS (PQC), AWS-LC, RustCrypto | | **Storage backends** | FS, etcd, SurrealDB, PostgreSQL | --- ## Getting Started (Adoption for Ops Teams) ### Recommended Progressive Adoption 1. **SecretumVault**: Secrets management with cryptographic agility (standalone) 2. **Kogral**: Establish operational knowledge base (runbooks, ADRs, postmortems) 3. **TypeDialog**: Configuration wizards for teams (CLI + Web + prov-gen) 4. **Provisioning**: Multi-cloud declarative IaC with orchestrator 5. **Vapora**: Orchestrate Ops agents with budget control (DevOps, Monitor, Security) Each project works independently. Synergies emerge when combining them. ### Quick Start per Project **SecretumVault**: ```bash # Docker Compose with etcd docker-compose -f deploy/docker/docker-compose.yml up -d # Initialize vault curl -X POST http://localhost:8200/v1/sys/init -d '{"shares": 5, "threshold": 3}' # Unseal with 3 shares curl -X POST http://localhost:8200/v1/sys/unseal -d '{"key": ""}' curl -X POST http://localhost:8200/v1/sys/unseal -d '{"key": ""}' curl -X POST http://localhost:8200/v1/sys/unseal -d '{"key": ""}' # Enable PKI engine for certificates svault secret engine enable pki ``` **Kogral**: ```bash # Initialize knowledge repository kogral init # Add runbook kogral add note "PostgreSQL Connection Pool Tuning" \ --tags "database,postgresql,performance" # Add ADR kogral add decision "Choose Cilium over Calico" \ --context "Need CNI for K8s with eBPF" \ --decision "Cilium for performance and observability" \ --consequences "Higher initial complexity, better long-term performance" # Serve MCP server for Claude Code kogral serve --port 3100 ``` **Provisioning**: ```bash # Clone repository git clone https://repo.jesusperez.pro/jesus/provisioning cd provisioning # Configure provider (UpCloud in this example) cp config/providers/upcloud.example.toml config/providers/upcloud.toml # Edit with UpCloud credentials # Create K8s cluster (Nickel definition) cat > cluster.ncl </assign \ -H "Content-Type: application/json" \ -d '{"agent_role": "DevOps"}' ``` --- ## Contact - **Repositories**: GitHub (private projects) - **Stack**: Rust, Nickel, Nushell, SurrealDB, Axum - **License**: Proprietary / To be defined --- *Modern infrastructure shouldn't require 10 disconnected tools.* *One ecosystem. Five projects. Real integration for Ops/DevOps.*