736 lines
22 KiB
Markdown
736 lines
22 KiB
Markdown
|
|
# Ops/DevOps Portfolio: Modern Infrastructure End-to-End
|
||
|
|
|
||
|
|
## The Problem
|
||
|
|
|
||
|
|
DevOps and platform teams face critical challenges managing modern infrastructure:
|
||
|
|
|
||
|
|
- **Fragmented tools**: Terraform for IaC, Ansible for configuration, Vault for secrets, all disconnected
|
||
|
|
- **Untyped YAML**: Configuration errors that explode at runtime, not at compile time
|
||
|
|
- **Static cryptography**: No preparation for future quantum threats
|
||
|
|
- **Manual orchestration**: Fragile imperative scripts without rollback or recovery
|
||
|
|
- **Hidden costs**: No visibility into LLM spending for infrastructure generation
|
||
|
|
- **Complex multi-cloud**: Different APIs, configurations and tools per provider
|
||
|
|
|
||
|
|
## The Solution: An Integrated Ecosystem
|
||
|
|
|
||
|
|
Five projects designed to work together, covering the complete operations cycle.
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Provisioning: Declarative Infrastructure as Code
|
||
|
|
|
||
|
|
### Typed IaC with AI-Assisted Generation
|
||
|
|
|
||
|
|
Provisioning combines the precision of typed configuration (Nickel) with AI-assisted generation, eliminating fragile YAML and imperative scripts.
|
||
|
|
|
||
|
|
**Unique capabilities**:
|
||
|
|
|
||
|
|
- **Nickel IaC**: Typed configuration with lazy evaluation, pre-runtime validation
|
||
|
|
- **MCP Server**: Natural language queries about infrastructure
|
||
|
|
- **Integrated RAG**: 1,200+ domain documents for contextual responses
|
||
|
|
- **Multi-cloud**: AWS, UpCloud, local (LXD) from the same definition
|
||
|
|
|
||
|
|
**Hybrid orchestration**:
|
||
|
|
|
||
|
|
- Rust orchestrator for critical workflows (10-50x performance vs Python)
|
||
|
|
- Nushell scripts for flexibility and rapid prototyping
|
||
|
|
- Automatic dependency resolution (topological sorting)
|
||
|
|
- Checkpoints and automatic rollback on failures
|
||
|
|
|
||
|
|
**The workflow**:
|
||
|
|
|
||
|
|
```text
|
||
|
|
"I need a K8s cluster on AWS with 3 nodes and Cilium"
|
||
|
|
↓
|
||
|
|
MCP Server (NLP)
|
||
|
|
↓
|
||
|
|
RAG searches similar configurations
|
||
|
|
↓
|
||
|
|
Generates Nickel + validates types
|
||
|
|
↓
|
||
|
|
Orchestrator deploys:
|
||
|
|
1. containerd (dependency)
|
||
|
|
2. etcd (dependency)
|
||
|
|
3. kubernetes (core)
|
||
|
|
4. cilium (CNI)
|
||
|
|
With checkpoints and automatic rollback
|
||
|
|
```
|
||
|
|
|
||
|
|
**Enterprise security**:
|
||
|
|
|
||
|
|
- JWT + MFA (TOTP + WebAuthn)
|
||
|
|
- Cedar policy engine for RBAC/ABAC
|
||
|
|
- 7 years audit log retention
|
||
|
|
- 5 KMS backends (RustyVault, Age, AWS KMS, Vault, Cosmian)
|
||
|
|
- SOPS/Age for configuration encryption at rest
|
||
|
|
|
||
|
|
**For whom**:
|
||
|
|
|
||
|
|
- DevOps teams wanting typed IaC, not fragile YAML
|
||
|
|
- Multi-cloud organizations (AWS + UpCloud + on-premise)
|
||
|
|
- Teams needing audit, compliance and enterprise security
|
||
|
|
|
||
|
|
**Expected results**:
|
||
|
|
|
||
|
|
- Configuration errors detected at compile time, not at runtime
|
||
|
|
- Infrastructure generated from natural language (MCP + RAG)
|
||
|
|
- Automatic rollback on failures with state management
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## SecretumVault: Secrets Management with Post-Quantum Crypto
|
||
|
|
|
||
|
|
### Rust Vault with PQC in Production
|
||
|
|
|
||
|
|
SecretumVault is a secrets management system that implements **production-ready post-quantum cryptography** (ML-KEM-768, ML-DSA-65), providing cryptographic agility for organizations deploying today.
|
||
|
|
|
||
|
|
**Crypto-agnostic**:
|
||
|
|
|
||
|
|
- **OpenSSL**: RSA, ECDSA, AES-256-GCM (classical compatibility)
|
||
|
|
- **OQS (Post-Quantum)**: ML-KEM-768, ML-DSA-65 (NIST FIPS 203/204)
|
||
|
|
- **AWS-LC**: Experimental PQC (testing)
|
||
|
|
- **RustCrypto**: Pure-Rust implementations (testing)
|
||
|
|
- **Pluggable backends**: Change algorithms without modifying code
|
||
|
|
|
||
|
|
**Secrets engines**:
|
||
|
|
|
||
|
|
| Engine | Capability | Use cases |
|
||
|
|
| -------- | ------------ | ----------- |
|
||
|
|
| **KV** | Versioned secret storage | Credentials, API keys, sensitive configurations |
|
||
|
|
| **Transit** | Encryption-as-a-service with key rotation | Application data encryption, key rotation |
|
||
|
|
| **PKI** | X.509 certificate generation | mTLS, service mesh, internal infrastructure |
|
||
|
|
| **Database** | Dynamic credentials with TTL | PostgreSQL, MySQL, MongoDB credentials on-demand |
|
||
|
|
|
||
|
|
**Multi-backend storage**:
|
||
|
|
|
||
|
|
- **Filesystem**: Development, single-node, rapid prototyping
|
||
|
|
- **etcd**: Kubernetes, high availability, strong consistency
|
||
|
|
- **SurrealDB**: Complex queries, time-series, multi-tenant scopes
|
||
|
|
- **PostgreSQL**: Enterprise, ACID, complete auditing
|
||
|
|
|
||
|
|
**Enterprise security**:
|
||
|
|
|
||
|
|
- Shamir Secret Sharing for unsealing (configurable threshold)
|
||
|
|
- Cedar policy engine (ABAC, AWS-compatible)
|
||
|
|
- Native TLS/mTLS with X.509 certificates
|
||
|
|
- Complete audit logging with configurable retention
|
||
|
|
- Token management with TTL and renewal
|
||
|
|
|
||
|
|
**Ops/DevOps workflow**:
|
||
|
|
|
||
|
|
```bash
|
||
|
|
# Initialize vault with Shamir (5 shares, threshold 3)
|
||
|
|
svault operator init --shares 5 --threshold 3
|
||
|
|
|
||
|
|
# Unseal with 3 shares
|
||
|
|
svault operator unseal --share <share-1>
|
||
|
|
svault operator unseal --share <share-2>
|
||
|
|
svault operator unseal --share <share-3>
|
||
|
|
|
||
|
|
# Enable Database engine for PostgreSQL
|
||
|
|
svault secret engine enable database
|
||
|
|
svault secret database config postgres-prod \
|
||
|
|
plugin_name=postgresql-database-plugin \
|
||
|
|
connection_url="postgresql://{{username}}:{{password}}@postgres:5432/mydb" \
|
||
|
|
username="vault" password="vaultpass"
|
||
|
|
|
||
|
|
# Create role for dynamic credentials
|
||
|
|
svault secret database role create myapp-role \
|
||
|
|
db_name=postgres-prod \
|
||
|
|
creation_statements="CREATE USER '{{name}}' WITH PASSWORD '{{password}}' VALID UNTIL '{{expiration}}'; GRANT SELECT ON ALL TABLES IN SCHEMA public TO '{{name}}';" \
|
||
|
|
default_ttl=1h max_ttl=24h
|
||
|
|
|
||
|
|
# Get dynamic credentials (generated on-demand)
|
||
|
|
svault secret read database/creds/myapp-role
|
||
|
|
# Key Value
|
||
|
|
# --- -----
|
||
|
|
# lease_id database/creds/myapp-role/abc123
|
||
|
|
# lease_duration 3600
|
||
|
|
# username v-myapp-role-xyz789
|
||
|
|
# password A1b2C3d4E5f6G7h8
|
||
|
|
|
||
|
|
# Credentials are automatically revoked after 1h TTL
|
||
|
|
```
|
||
|
|
|
||
|
|
**For whom**:
|
||
|
|
|
||
|
|
- Teams deploying post-quantum cryptography today
|
||
|
|
- Organizations with cryptographic agility requirements
|
||
|
|
- Multi-cloud platforms needing Rust-native secrets management
|
||
|
|
- Security teams evaluating future quantum threats
|
||
|
|
|
||
|
|
**Expected results**:
|
||
|
|
|
||
|
|
- Preparation for quantum threats without changing architecture
|
||
|
|
- Secrets management with Rust memory guarantees
|
||
|
|
- Native integration with Provisioning (KMS) and Vapora (agent credentials)
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Vapora: Agent Orchestration with Cost Control
|
||
|
|
|
||
|
|
### Intelligent Agents for Operations
|
||
|
|
|
||
|
|
Vapora is not just for feature development. It's an orchestration platform that can coordinate specialized agents for DevOps operations.
|
||
|
|
|
||
|
|
**Available agents for Ops**:
|
||
|
|
|
||
|
|
- **DevOps**: CI/CD, pipelines, deployment automation
|
||
|
|
- **Monitor**: Health checks, alerting, real-time metrics
|
||
|
|
- **Security**: Auditing, compliance, vulnerability scanning
|
||
|
|
- **ProjectManager**: Roadmap, tracking, task coordination
|
||
|
|
|
||
|
|
**Real cost control for LLMs**:
|
||
|
|
|
||
|
|
- Budgets per role (monthly/weekly)
|
||
|
|
- Three levels: normal → near limit → exceeded
|
||
|
|
- Automatic fallback to cheaper providers without manual intervention
|
||
|
|
- Prometheus metrics: `vapora_budget_utilization`, `vapora_fallback_triggers`
|
||
|
|
|
||
|
|
**NATS JetStream coordination**:
|
||
|
|
|
||
|
|
```text
|
||
|
|
┌──────────────────────────────────────────────────────┐
|
||
|
|
│ NATS JetStream Messaging │
|
||
|
|
├──────────────────────────────────────────────────────┤
|
||
|
|
│ │
|
||
|
|
│ vapora.tasks.assign → Task assignment │
|
||
|
|
│ vapora.tasks.results → Execution results │
|
||
|
|
│ vapora.agents.heartbeat → Agent health check │
|
||
|
|
│ │
|
||
|
|
│ Persistence: JetStream streams │
|
||
|
|
│ Delivery: At-least-once with acknowledgment │
|
||
|
|
│ Ordering: Per-subject message ordering │
|
||
|
|
└──────────────────────────────────────────────────────┘
|
||
|
|
```
|
||
|
|
|
||
|
|
**Ops pipeline orchestration**:
|
||
|
|
|
||
|
|
```text
|
||
|
|
Pipeline: "Deploy microservice to K8s"
|
||
|
|
|
||
|
|
1. Security Agent: Docker image vulnerability scan
|
||
|
|
2. DevOps Agent: Validate K8s manifests + Helm charts
|
||
|
|
3. Monitor Agent: Setup Prometheus metrics + alerts
|
||
|
|
4. DevOps Agent: Deploy with kubectl apply + health check
|
||
|
|
5. Monitor Agent: Validate health endpoints + smoke tests
|
||
|
|
|
||
|
|
If any step fails: coordinated automatic rollback
|
||
|
|
```
|
||
|
|
|
||
|
|
**Metrics and observability**:
|
||
|
|
|
||
|
|
- Prometheus metrics endpoint (`/metrics`)
|
||
|
|
- OpenTelemetry integration (traces, spans)
|
||
|
|
- SurrealDB for execution storage
|
||
|
|
- Grafana dashboards for visualization
|
||
|
|
|
||
|
|
**For whom**:
|
||
|
|
|
||
|
|
- DevOps teams coordinating multiple LLM agents for operations
|
||
|
|
- Organizations needing to control LLM spending in automation
|
||
|
|
- Platforms with complex pipelines (CI/CD, deployment, monitoring)
|
||
|
|
|
||
|
|
**Expected results**:
|
||
|
|
|
||
|
|
- LLM cost reduction through intelligent routing
|
||
|
|
- Automatic orchestration of complex operational tasks
|
||
|
|
- Complete visibility of spending and performance per agent
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## TypeDialog: Multi-Backend Forms for Configuration
|
||
|
|
|
||
|
|
### One Definition, Six Interfaces (Includes prov-gen)
|
||
|
|
|
||
|
|
TypeDialog unifies configuration capture in CLI, TUI, Web, and has a specialized backend for multi-cloud IaC generation.
|
||
|
|
|
||
|
|
**Operational backends**:
|
||
|
|
|
||
|
|
| Backend | Typical Ops/DevOps use |
|
||
|
|
| --------- | ------------------------ |
|
||
|
|
| **CLI** | Automation scripts, CI/CD pipelines |
|
||
|
|
| **TUI** | Admin tools, terminal dashboards |
|
||
|
|
| **Web** | Self-service portals, team forms |
|
||
|
|
| **Prov-gen** | **Multi-cloud infrastructure generation** |
|
||
|
|
|
||
|
|
**Prov-gen Backend: IaC Generation**
|
||
|
|
|
||
|
|
The `prov-gen` backend generates Nickel infrastructure configurations for multiple clouds from typed forms:
|
||
|
|
|
||
|
|
```toml
|
||
|
|
# cluster-setup.toml
|
||
|
|
[form]
|
||
|
|
id = "k8s_cluster"
|
||
|
|
title = "Kubernetes Cluster Setup"
|
||
|
|
|
||
|
|
[[sections]]
|
||
|
|
id = "cloud"
|
||
|
|
title = "Cloud Provider"
|
||
|
|
|
||
|
|
[[sections.fields]]
|
||
|
|
id = "provider"
|
||
|
|
type = "select"
|
||
|
|
label = "Provider"
|
||
|
|
required = true
|
||
|
|
options = [
|
||
|
|
{ value = "aws", label = "AWS" },
|
||
|
|
{ value = "upcloud", label = "UpCloud" },
|
||
|
|
{ value = "local", label = "Local LXD" },
|
||
|
|
]
|
||
|
|
|
||
|
|
[[sections.fields]]
|
||
|
|
id = "region"
|
||
|
|
type = "text"
|
||
|
|
label = "Region"
|
||
|
|
required = true
|
||
|
|
|
||
|
|
[[sections]]
|
||
|
|
id = "cluster"
|
||
|
|
title = "Cluster Configuration"
|
||
|
|
|
||
|
|
[[sections.fields]]
|
||
|
|
id = "node_count"
|
||
|
|
type = "number"
|
||
|
|
label = "Node Count"
|
||
|
|
default = 3
|
||
|
|
validation.min = 1
|
||
|
|
validation.max = 20
|
||
|
|
|
||
|
|
[[sections.fields]]
|
||
|
|
id = "node_size"
|
||
|
|
type = "select"
|
||
|
|
label = "Node Size"
|
||
|
|
options = [
|
||
|
|
{ value = "small", label = "Small (2 CPU, 4GB RAM)" },
|
||
|
|
{ value = "medium", label = "Medium (4 CPU, 8GB RAM)" },
|
||
|
|
{ value = "large", label = "Large (8 CPU, 16GB RAM)" },
|
||
|
|
]
|
||
|
|
|
||
|
|
[output]
|
||
|
|
backend = "prov-gen"
|
||
|
|
format = "nickel"
|
||
|
|
validation = "nickel://schemas/kubernetes_cluster.ncl"
|
||
|
|
```
|
||
|
|
|
||
|
|
Execute with prov-gen:
|
||
|
|
|
||
|
|
```bash
|
||
|
|
typedialog execute cluster-setup.toml --backend prov-gen --output k8s-cluster.ncl
|
||
|
|
```
|
||
|
|
|
||
|
|
Generates Nickel IaC:
|
||
|
|
|
||
|
|
```nickel
|
||
|
|
# k8s-cluster.ncl (automatically generated)
|
||
|
|
{
|
||
|
|
provider = "aws",
|
||
|
|
region = "us-east-1",
|
||
|
|
|
||
|
|
servers = [
|
||
|
|
{
|
||
|
|
name = "k8s-control-plane-01",
|
||
|
|
plan = "medium",
|
||
|
|
role = "control-plane",
|
||
|
|
provider = "aws",
|
||
|
|
},
|
||
|
|
{
|
||
|
|
name = "k8s-worker-01",
|
||
|
|
plan = "medium",
|
||
|
|
role = "worker",
|
||
|
|
provider = "aws",
|
||
|
|
},
|
||
|
|
{
|
||
|
|
name = "k8s-worker-02",
|
||
|
|
plan = "medium",
|
||
|
|
role = "worker",
|
||
|
|
provider = "aws",
|
||
|
|
},
|
||
|
|
],
|
||
|
|
|
||
|
|
taskservs = [
|
||
|
|
"containerd",
|
||
|
|
"etcd",
|
||
|
|
"kubernetes",
|
||
|
|
"cilium",
|
||
|
|
],
|
||
|
|
|
||
|
|
networking = {
|
||
|
|
vpc_cidr = "10.0.0.0/16",
|
||
|
|
pod_cidr = "10.244.0.0/16",
|
||
|
|
service_cidr = "10.96.0.0/12",
|
||
|
|
},
|
||
|
|
}
|
||
|
|
```
|
||
|
|
|
||
|
|
**Nickel contracts validation**:
|
||
|
|
|
||
|
|
```rust
|
||
|
|
// Automatic validation with Nickel schemas
|
||
|
|
let validator = NickelValidator::new();
|
||
|
|
let result = validator.validate(&generated_iac, "schemas/kubernetes_cluster.ncl")?;
|
||
|
|
|
||
|
|
if result.errors.is_empty() {
|
||
|
|
// Valid IaC, ready for Provisioning
|
||
|
|
provisioning_client.apply(&generated_iac).await?;
|
||
|
|
} else {
|
||
|
|
// Validation errors, show to user
|
||
|
|
eprintln!("Validation errors: {:?}", result.errors);
|
||
|
|
}
|
||
|
|
```
|
||
|
|
|
||
|
|
**For whom**:
|
||
|
|
|
||
|
|
- DevOps teams maintaining configuration wizards in CLI and Web
|
||
|
|
- Organizations with self-service infrastructure portals
|
||
|
|
- Teams needing IaC generation from forms
|
||
|
|
|
||
|
|
**Expected results**:
|
||
|
|
|
||
|
|
- One TOML definition for CLI, TUI, Web and IaC generation
|
||
|
|
- Typed validation before runtime with Nickel contracts
|
||
|
|
- Reduction of manual configuration errors
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Kogral: Knowledge Base for Platform Teams
|
||
|
|
|
||
|
|
### Your Ops Knowledge Base, Queryable
|
||
|
|
|
||
|
|
Kogral captures architectural decisions, runbooks, postmortems and operational procedures in a format that both humans and AI agents can query.
|
||
|
|
|
||
|
|
**6 specialized node types for Ops**:
|
||
|
|
|
||
|
|
| Type | Ops/DevOps use |
|
||
|
|
| ------ | ---------------- |
|
||
|
|
| **Note** | Runbooks, procedures, troubleshooting guides |
|
||
|
|
| **Decision** | Infrastructure ADRs (why AWS vs UpCloud, etcd vs Consul) |
|
||
|
|
| **Guideline** | Deployment standards, security policies |
|
||
|
|
| **Pattern** | Reusable infrastructure patterns (multi-AZ, HA) |
|
||
|
|
| **Journal** | Change logs, daily stand-up notes |
|
||
|
|
| **Execution** | Deployment history, rollbacks, incidents |
|
||
|
|
|
||
|
|
**Git-native + MCP for Claude Code**:
|
||
|
|
|
||
|
|
- Everything in versioned markdown (`.kogral/` directory)
|
||
|
|
- MCP server for Claude Code: agents query runbooks before executing
|
||
|
|
- Semantic search with fastembed (local) or cloud embeddings
|
||
|
|
|
||
|
|
**The Ops flow**:
|
||
|
|
|
||
|
|
```text
|
||
|
|
Production incident → Capture postmortem in Kogral as Execution
|
||
|
|
↓
|
||
|
|
Claude Code queries via MCP → "How did we resolve this error before?"
|
||
|
|
↓
|
||
|
|
Kogral responds with similar postmortems + runbooks
|
||
|
|
↓
|
||
|
|
Agent applies documented solution instead of guessing
|
||
|
|
```
|
||
|
|
|
||
|
|
**MCP Tools for Ops**:
|
||
|
|
|
||
|
|
```bash
|
||
|
|
# Search troubleshooting runbooks
|
||
|
|
kogral-mcp search "nginx 502 error troubleshooting"
|
||
|
|
|
||
|
|
# Add incident postmortem
|
||
|
|
kogral-mcp add-execution \
|
||
|
|
--title "2026-01-22 PostgreSQL Connection Pool Exhaustion" \
|
||
|
|
--context "Production database connections maxed out" \
|
||
|
|
--resolution "Increased max_connections from 100 to 200, added PgBouncer" \
|
||
|
|
--tags "database,incident,postgresql"
|
||
|
|
|
||
|
|
# Get deployment guidelines
|
||
|
|
kogral-mcp get-guidelines "kubernetes deployment" --include-shared true
|
||
|
|
```
|
||
|
|
|
||
|
|
**For whom**:
|
||
|
|
|
||
|
|
- Platform teams needing to preserve operational knowledge
|
||
|
|
- SRE teams with rotation losing context of previous incidents
|
||
|
|
- DevOps using Claude Code wanting contextualized runbooks
|
||
|
|
|
||
|
|
**Expected results**:
|
||
|
|
|
||
|
|
- New SRE onboarding in days, not weeks
|
||
|
|
- Incident resolution informed by previous postmortems
|
||
|
|
- Infrastructure decisions preserved and searchable
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## The Ecosystem in Action: Ops Scenarios
|
||
|
|
|
||
|
|
### Scenario 1: New Multi-Cloud Kubernetes Cluster
|
||
|
|
|
||
|
|
```text
|
||
|
|
1. TypeDialog (prov-gen): Configuration wizard for cluster
|
||
|
|
- Cloud provider, region, node count, node size
|
||
|
|
- Generates validated Nickel IaC
|
||
|
|
|
||
|
|
2. Provisioning: Deploys infrastructure
|
||
|
|
- Creates servers on AWS/UpCloud
|
||
|
|
- Installs containerd, etcd, kubernetes, cilium
|
||
|
|
- Checkpoints per step, automatic rollback if fails
|
||
|
|
|
||
|
|
3. SecretumVault: Generates PKI certificates
|
||
|
|
- Certificates for etcd, kube-apiserver, kubelet
|
||
|
|
- Automatic rotation every 90 days
|
||
|
|
|
||
|
|
4. Kogral: Documents architecture decision
|
||
|
|
- ADR: "Why Cilium over Calico"
|
||
|
|
- Runbook: "How to scale cluster from 3 to 10 nodes"
|
||
|
|
|
||
|
|
5. Vapora: Orchestrates post-deployment
|
||
|
|
- Monitor Agent: Setup Prometheus + Grafana
|
||
|
|
- Security Agent: Vulnerability scanning
|
||
|
|
- DevOps Agent: Deploy test applications
|
||
|
|
```
|
||
|
|
|
||
|
|
### Scenario 2: Production Incident (Database Outage)
|
||
|
|
|
||
|
|
```text
|
||
|
|
1. Vapora Monitor Agent: Detects PostgreSQL down
|
||
|
|
- Alert via NATS JetStream
|
||
|
|
- Trigger incident response pipeline
|
||
|
|
|
||
|
|
2. Kogral: Claude Code queries via MCP
|
||
|
|
- "PostgreSQL outage postmortems?"
|
||
|
|
- Returns 3 similar incidents with resolutions
|
||
|
|
|
||
|
|
3. Vapora DevOps Agent: Executes runbook
|
||
|
|
- Restarts PostgreSQL with adjusted parameters
|
||
|
|
- Verifies health checks
|
||
|
|
|
||
|
|
4. SecretumVault: Rotates DB credentials
|
||
|
|
- Generates new dynamic credentials
|
||
|
|
- Updates applications via Database engine
|
||
|
|
|
||
|
|
5. Kogral: Documents postmortem
|
||
|
|
- Execution node with root cause, resolution, action items
|
||
|
|
- Linked to PostgreSQL configuration ADRs
|
||
|
|
```
|
||
|
|
|
||
|
|
### Scenario 3: Post-Quantum Cryptography Migration
|
||
|
|
|
||
|
|
```text
|
||
|
|
1. Kogral: Documents migration decision
|
||
|
|
- ADR: "Migration to ML-KEM-768 for quantum threat preparation"
|
||
|
|
- Timeline, risks, mitigation strategies
|
||
|
|
|
||
|
|
2. SecretumVault: Migrates secrets
|
||
|
|
- Backend change: openssl → oqs
|
||
|
|
- Re-encrypts secrets with ML-KEM-768
|
||
|
|
- Maintains compatibility with classical clients
|
||
|
|
|
||
|
|
3. Provisioning: Updates infrastructure
|
||
|
|
- Generates new PKI certificates with ML-DSA-65
|
||
|
|
- Deploys certificates to services (etcd, K8s API)
|
||
|
|
- Automatic rollback if health checks fail
|
||
|
|
|
||
|
|
4. Vapora: Orchestrates validation
|
||
|
|
- Security Agent: Verifies correct cryptography
|
||
|
|
- Monitor Agent: Validates latency not degraded
|
||
|
|
- DevOps Agent: Executes integration tests
|
||
|
|
|
||
|
|
5. TypeDialog: Self-service portal for teams
|
||
|
|
- Form: "Migrate service to PQC"
|
||
|
|
- prov-gen backend generates updated configuration
|
||
|
|
```
|
||
|
|
|
||
|
|
### Scenario 4: CI/CD with AI Validation
|
||
|
|
|
||
|
|
```text
|
||
|
|
1. Developer: Push to Git repository (Gitea)
|
||
|
|
|
||
|
|
2. Vapora DevOps Agent (trigger via webhook):
|
||
|
|
- Executes linting, unit tests
|
||
|
|
- Build Docker image
|
||
|
|
- Vulnerability scan with Security Agent
|
||
|
|
|
||
|
|
3. TypeDialog: Deployment form
|
||
|
|
- Environment (staging/production)
|
||
|
|
- Canary rollout percentage
|
||
|
|
- Generates validated K8s configuration
|
||
|
|
|
||
|
|
4. Provisioning: Deploys with Tekton
|
||
|
|
- Apply K8s manifests with kubectl
|
||
|
|
- Automatic health checks
|
||
|
|
- Rollback if health check fails
|
||
|
|
|
||
|
|
5. SecretumVault: Injects secrets
|
||
|
|
- Dynamic DB credentials (TTL 1h)
|
||
|
|
- API keys from KV engine
|
||
|
|
- TLS certificates from PKI engine
|
||
|
|
|
||
|
|
6. Kogral: Records deployment
|
||
|
|
- Execution node with version, timestamp, author
|
||
|
|
- Link to commit SHA, PR, changes
|
||
|
|
```
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Why Choose This Ecosystem (Ops Perspective)
|
||
|
|
|
||
|
|
### Versus Alternatives
|
||
|
|
|
||
|
|
| Us | Terraform + Ansible + Vault |
|
||
|
|
| ---- | ----------------------------- |
|
||
|
|
| **Typed configuration**: Nickel with pre-runtime validation | YAML/HCL without types, errors at runtime |
|
||
|
|
| **Integrated orchestration**: Provisioning orchestrator with rollback | Imperative scripts, no automatic recovery |
|
||
|
|
| **Post-Quantum crypto**: SecretumVault with ML-KEM/ML-DSA today | Vault without PQC roadmap |
|
||
|
|
| **Unified multi-cloud**: One Nickel configuration for AWS/UpCloud/Local | Separate configurations per cloud |
|
||
|
|
| **AI-native**: MCP + RAG for assisted generation | No AI assistance, manual configuration |
|
||
|
|
| **Full Rust stack**: Performance, memory-safety | Mixed Python/Go/Shell with overhead |
|
||
|
|
|
||
|
|
### Technical Investment (Ops Focus)
|
||
|
|
|
||
|
|
| Metric | Value |
|
||
|
|
| -------- | ------- |
|
||
|
|
| **Provisioning**: Nickel IaC, 80+ CLI shortcuts | ~40K LOC |
|
||
|
|
| **SecretumVault**: 4 crypto backends, 4 storage backends | ~11K LOC |
|
||
|
|
| **Vapora**: NATS JetStream, 12 agent roles | ~50K LOC |
|
||
|
|
| **TypeDialog**: 6 backends including prov-gen | ~90K LOC |
|
||
|
|
| **Kogral**: 6 node types, MCP server | ~15K LOC |
|
||
|
|
| **Total tests** | 4,360+ |
|
||
|
|
| **Crypto backends** | OpenSSL, OQS (PQC), AWS-LC, RustCrypto |
|
||
|
|
| **Storage backends** | FS, etcd, SurrealDB, PostgreSQL |
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Getting Started (Adoption for Ops Teams)
|
||
|
|
|
||
|
|
### Recommended Progressive Adoption
|
||
|
|
|
||
|
|
1. **SecretumVault**: Secrets management with cryptographic agility (standalone)
|
||
|
|
2. **Kogral**: Establish operational knowledge base (runbooks, ADRs, postmortems)
|
||
|
|
3. **TypeDialog**: Configuration wizards for teams (CLI + Web + prov-gen)
|
||
|
|
4. **Provisioning**: Multi-cloud declarative IaC with orchestrator
|
||
|
|
5. **Vapora**: Orchestrate Ops agents with budget control (DevOps, Monitor, Security)
|
||
|
|
|
||
|
|
Each project works independently. Synergies emerge when combining them.
|
||
|
|
|
||
|
|
### Quick Start per Project
|
||
|
|
|
||
|
|
**SecretumVault**:
|
||
|
|
|
||
|
|
```bash
|
||
|
|
# Docker Compose with etcd
|
||
|
|
docker-compose -f deploy/docker/docker-compose.yml up -d
|
||
|
|
|
||
|
|
# Initialize vault
|
||
|
|
curl -X POST http://localhost:8200/v1/sys/init -d '{"shares": 5, "threshold": 3}'
|
||
|
|
|
||
|
|
# Unseal with 3 shares
|
||
|
|
curl -X POST http://localhost:8200/v1/sys/unseal -d '{"key": "<share-1>"}'
|
||
|
|
curl -X POST http://localhost:8200/v1/sys/unseal -d '{"key": "<share-2>"}'
|
||
|
|
curl -X POST http://localhost:8200/v1/sys/unseal -d '{"key": "<share-3>"}'
|
||
|
|
|
||
|
|
# Enable PKI engine for certificates
|
||
|
|
svault secret engine enable pki
|
||
|
|
```
|
||
|
|
|
||
|
|
**Kogral**:
|
||
|
|
|
||
|
|
```bash
|
||
|
|
# Initialize knowledge repository
|
||
|
|
kogral init
|
||
|
|
|
||
|
|
# Add runbook
|
||
|
|
kogral add note "PostgreSQL Connection Pool Tuning" \
|
||
|
|
--tags "database,postgresql,performance"
|
||
|
|
|
||
|
|
# Add ADR
|
||
|
|
kogral add decision "Choose Cilium over Calico" \
|
||
|
|
--context "Need CNI for K8s with eBPF" \
|
||
|
|
--decision "Cilium for performance and observability" \
|
||
|
|
--consequences "Higher initial complexity, better long-term performance"
|
||
|
|
|
||
|
|
# Serve MCP server for Claude Code
|
||
|
|
kogral serve --port 3100
|
||
|
|
```
|
||
|
|
|
||
|
|
**Provisioning**:
|
||
|
|
|
||
|
|
```bash
|
||
|
|
# Clone repository
|
||
|
|
git clone https://repo.jesusperez.pro/jesus/provisioning
|
||
|
|
cd provisioning
|
||
|
|
|
||
|
|
# Configure provider (UpCloud in this example)
|
||
|
|
cp config/providers/upcloud.example.toml config/providers/upcloud.toml
|
||
|
|
# Edit with UpCloud credentials
|
||
|
|
|
||
|
|
# Create K8s cluster (Nickel definition)
|
||
|
|
cat > cluster.ncl <<EOF
|
||
|
|
{
|
||
|
|
provider = "upcloud",
|
||
|
|
region = "de-fra1",
|
||
|
|
servers = [
|
||
|
|
{ name = "k8s-cp-01", plan = "medium", role = "control-plane" },
|
||
|
|
{ name = "k8s-worker-01", plan = "medium", role = "worker" },
|
||
|
|
{ name = "k8s-worker-02", plan = "medium", role = "worker" },
|
||
|
|
],
|
||
|
|
taskservs = ["containerd", "etcd", "kubernetes", "cilium"],
|
||
|
|
}
|
||
|
|
EOF
|
||
|
|
|
||
|
|
# Validate configuration
|
||
|
|
nickel typecheck cluster.ncl
|
||
|
|
|
||
|
|
# Apply (orchestrator with checkpoints)
|
||
|
|
prov apply cluster.ncl --with-rollback
|
||
|
|
```
|
||
|
|
|
||
|
|
**TypeDialog (prov-gen)**:
|
||
|
|
|
||
|
|
```bash
|
||
|
|
# Execute cluster configuration wizard
|
||
|
|
typedialog execute examples/ops/cluster-setup.toml \
|
||
|
|
--backend prov-gen \
|
||
|
|
--output my-cluster.ncl
|
||
|
|
|
||
|
|
# Generated configuration ready for Provisioning
|
||
|
|
nickel typecheck my-cluster.ncl
|
||
|
|
prov apply my-cluster.ncl
|
||
|
|
```
|
||
|
|
|
||
|
|
**Vapora**:
|
||
|
|
|
||
|
|
```bash
|
||
|
|
# Deploy with Docker Compose (backend + NATS + SurrealDB)
|
||
|
|
docker-compose up -d
|
||
|
|
|
||
|
|
# Create project
|
||
|
|
curl -X POST http://localhost:8001/projects \
|
||
|
|
-H "Content-Type: application/json" \
|
||
|
|
-d '{"name": "Infrastructure Automation", "description": "DevOps pipelines"}'
|
||
|
|
|
||
|
|
# Create task for DevOps Agent
|
||
|
|
curl -X POST http://localhost:8001/tasks \
|
||
|
|
-H "Content-Type: application/json" \
|
||
|
|
-d '{
|
||
|
|
"title": "Deploy Prometheus to K8s",
|
||
|
|
"task_type": "deployment",
|
||
|
|
"context": {"cluster": "prod-us-east-1", "namespace": "monitoring"}
|
||
|
|
}'
|
||
|
|
|
||
|
|
# Assign to DevOps Agent
|
||
|
|
curl -X POST http://localhost:8001/tasks/<task-id>/assign \
|
||
|
|
-H "Content-Type: application/json" \
|
||
|
|
-d '{"agent_role": "DevOps"}'
|
||
|
|
```
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Contact
|
||
|
|
|
||
|
|
- **Repositories**: GitHub (private projects)
|
||
|
|
- **Stack**: Rust, Nickel, Nushell, SurrealDB, Axum
|
||
|
|
- **License**: Proprietary / To be defined
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
*Modern infrastructure shouldn't require 10 disconnected tools.*
|
||
|
|
*One ecosystem. Five projects. Real integration for Ops/DevOps.*
|