jesus/stratumiops

Fork 0

Jesús Pérez 1680d80a3d

Rust CI / Security Audit (push) Has been cancelled

Details

Rust CI / Check + Test + Lint (nightly) (push) Has been cancelled

Details

Rust CI / Check + Test + Lint (stable) (push) Has been cancelled

Details

Nickel Type Check / Nickel Type Checking (push) Has been cancelled

Details

chore: Init repo, add docs

2026-01-22 22:15:19 +00:00

62 KiB

Raw Blame History

Portfolio Ops/DevOps: Especificaciones Técnicas

Arquitectura del Ecosistema Ops

┌─────────────────────────────────────────────────────────────────────────────┐
│                           CAPA DE INTERFACES                                │
├─────────────────────────────────────────────────────────────────────────────┤
│  Leptos WASM (Vapora Kanban)  │  Ratatui TUI  │  Axum REST  │  CLI (clap)   │
└─────────────────────────────────────────────────────────────────────────────┘
                                      │
                                      ▼
┌─────────────────────────────────────────────────────────────────────────────┐
│                           CAPA DE ORQUESTACIÓN                              │
├─────────────────────────────────────────────────────────────────────────────┤
│  Vapora (NATS Agents)  │  Provisioning Orchestrator  │  TypeDialog Backends │
│  DevOps/Monitor/Security │  (Rust + Nushell hybrid)    │  (prov-gen IaC)    │
└─────────────────────────────────────────────────────────────────────────────┘
                                      │
                                      ▼
┌─────────────────────────────────────────────────────────────────────────────┐
│                           CAPA DE SEGURIDAD                                 │
├─────────────────────────────────────────────────────────────────────────────┤
│  SecretumVault (PQC)   │  Cedar Policies (ABAC)  │  JWT + MFA (Auth)        │
│  KV/Transit/PKI/DB     │  Audit Logging          │  TLS/mTLS (Transport)    │
└─────────────────────────────────────────────────────────────────────────────┘
                                      │
                                      ▼
┌─────────────────────────────────────────────────────────────────────────────┐
│                           CAPA DE PERSISTENCIA                              │
├─────────────────────────────────────────────────────────────────────────────┤
│  SurrealDB (multi-tenant, time-series)  │  NATS JetStream (messaging)       │
│  etcd (distributed KV)                  │  PostgreSQL (ACID, enterprise)    │
│  Filesystem (git-native markdown)       │  Kogral (.kogral/ directory)      │
└─────────────────────────────────────────────────────────────────────────────┘

1. Provisioning: Especificaciones Ops

Directory Structure

provisioning/
├── core/
│   ├── cli/               # Main CLI (211 lines, 80+ shortcuts)
│   ├── nulib/             # Nushell libraries (476+ config accessors)
│   └── scripts/           # Utility scripts (Nushell)
├── extensions/
│   ├── providers/         # AWS, UpCloud, Local (LXD)
│   │   ├── aws/           # EC2, VPC, S3, RDS provisioners
│   │   ├── upcloud/       # Servers, networking, storage
│   │   └── local/         # LXD containers, networking
│   ├── taskservs/         # 50+ infrastructure services
│   │   ├── containerd/    # Container runtime
│   │   ├── etcd/          # Distributed KV store
│   │   ├── kubernetes/    # K8s control-plane + workers
│   │   ├── cilium/        # eBPF-based CNI
│   │   ├── postgresql/    # Database
│   │   ├── prometheus/    # Metrics
│   │   └── ...            # 44 more services
│   ├── clusters/          # Pre-configured cluster templates
│   │   ├── k8s-ha/        # HA Kubernetes (3 control-plane, N workers)
│   │   ├── k8s-dev/       # Dev Kubernetes (single-node)
│   │   └── db-cluster/    # PostgreSQL HA with Patroni
│   └── workflows/         # Automation workflows
│       ├── backup/        # Backup automation
│       ├── monitoring/    # Observability setup
│       └── security/      # Security hardening
├── platform/
│   ├── orchestrator/      # Workflow execution engine (Rust)
│   ├── control-center/    # Backend API (Axum + RBAC)
│   ├── control-center-ui/ # Web dashboard (Leptos)
│   ├── installer/         # Multi-mode installer
│   ├── mcp-server/        # MCP server (Rust, 1000x Python)
│   ├── ai-service/        # AI operations (LLM integration)
│   ├── rag/               # RAG system (1200+ docs)
│   ├── vault-service/     # Secrets management (integra SecretumVault)
│   └── detector/          # Anomaly detection
└── schemas/               # Nickel IaC schemas (typed)
    ├── server.ncl         # Server definition contract
    ├── networking.ncl     # VPC, subnets, security groups
    ├── kubernetes.ncl     # K8s cluster contract
    └── ...                # 20+ schemas

Nickel IaC Schema Examples

Server Schema

# schemas/server.ncl
let Server = {
  name | String,
  provider | [ |  'aws, 'upcloud, 'local |],

  spec | {
    cpu | Number | default = 2,
    memory_gb | Number | default = 4,
    disk_gb | Number | default = 50,

    os | {
      family | [ |  'ubuntu, 'debian, 'rocky, 'alpine |],
      version | String,
    },
  },

  networking | {
    vpc | String | optional,
    subnet | String | optional,
    public_ip | Bool | default = false,
    security_groups | Array String | default = [],
  },

  tags | { _ : String } | default = {},

  # Validation constraints
} | {
  spec.cpu | Number
    | std.number.is_positive
    | doc "CPU cores must be positive",

  spec.memory_gb | Number
    | std.number.is_positive
    | doc "Memory must be positive GB",

  spec.disk_gb | Number
    | std.number.greater_eq 20
    | doc "Disk must be at least 20GB",
}
in Server

Kubernetes Cluster Schema

# schemas/kubernetes.ncl
let KubernetesCluster = {
  name | String,
  provider | [ |  'aws, 'upcloud, 'local |],
  region | String,

  control_plane | {
    count | Number | default = 3,
    plan | [ |  'small, 'medium, 'large |] | default = 'medium,
    high_availability | Bool | default = true,
  },

  workers | {
    count | Number | default = 3,
    plan | [ |  'small, 'medium, 'large, 'xlarge |] | default = 'medium,
    auto_scaling | {
      enabled | Bool | default = false,
      min | Number | default = 3,
      max | Number | default = 10,
    } | optional,
  },

  networking | {
    vpc_cidr | String | default = "10.0.0.0/16",
    pod_cidr | String | default = "10.244.0.0/16",
    service_cidr | String | default = "10.96.0.0/12",
    cni | [ |  'cilium, 'calico, 'flannel |] | default = 'cilium,
  },

  addons | {
    ingress_nginx | Bool | default = true,
    cert_manager | Bool | default = true,
    metrics_server | Bool | default = true,
    prometheus | Bool | default = false,
  },

  version | String | default = "1.28",
}
in KubernetesCluster

Orchestrator API (Rust)

// platform/orchestrator/src/lib.rs
use std::collections::HashMap;
use anyhow::Result;
use serde::{Deserialize, Serialize};
use tokio::sync::RwLock;

pub struct Orchestrator {
    state: Arc<RwLock<StateManager>>,
    executor: WorkflowExecutor,
    scheduler: Scheduler,
    checkpoint_store: CheckpointStore,
}

#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct Workflow {
    pub id: String,
    pub name: String,
    pub tasks: Vec<Task>,
    pub dependencies: HashMap<String, Vec<String>>,
}

#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct Task {
    pub id: String,
    pub task_type: TaskType,
    pub provider: Provider,
    pub config: serde_json::Value,
    pub retry_policy: RetryPolicy,
}

#[derive(Debug, Clone, Serialize, Deserialize)]
pub enum TaskType {
    ProvisionServer,
    ConfigureNetworking,
    InstallService,
    RunHealthCheck,
    CreateBackup,
}

impl Orchestrator {
    pub async fn execute_workflow(&self, workflow: Workflow) -> Result<ExecutionResult> {
        // 1. Resolve dependencies (topological sort)
        let ordered_tasks = self.resolve_dependencies(&workflow)?;
        tracing::info!("Resolved {} tasks", ordered_tasks.len());

        // 2. Create execution checkpoint
        let checkpoint = self.checkpoint_store.create(&workflow).await?;
        tracing::info!("Created checkpoint: {}", checkpoint.id);

        // 3. Execute tasks with retry logic
        for (index, task) in ordered_tasks.iter().enumerate() {
            tracing::info!("Executing task {}/{}: {}", index + 1, ordered_tasks.len(), task.id);

            match self.executor.run(task).await {
                Ok(result) => {
                    self.state.write().await.record_success(task, &result)?;
                    self.checkpoint_store.update_progress(&checkpoint.id, index + 1).await?;
                }
                Err(e) => {
                    tracing::error!("Task {} failed: {}", task.id, e);

                    // Exponential backoff retry
                    if let Some(result) = self.retry_with_backoff(task).await? {
                        self.state.write().await.record_success(task, &result)?;
                    } else {
                        // Rollback to checkpoint
                        tracing::warn!("Rollback to checkpoint {}", checkpoint.id);
                        self.rollback(&checkpoint).await?;
                        return Err(e);
                    }
                }
            }
        }

        Ok(ExecutionResult::from_state(&*self.state.read().await))
    }

    async fn retry_with_backoff(&self, task: &Task) -> Result<Option<TaskResult>> {
        let mut delay_ms = task.retry_policy.initial_delay_ms;

        for attempt in 1..=task.retry_policy.max_retries {
            tracing::info!("Retry attempt {}/{} for task {}", attempt, task.retry_policy.max_retries, task.id);
            tokio::time::sleep(tokio::time::Duration::from_millis(delay_ms)).await;

            match self.executor.run(task).await {
                Ok(result) => return Ok(Some(result)),
                Err(e) => {
                    tracing::warn!("Retry {} failed: {}", attempt, e);
                    delay_ms = (delay_ms * task.retry_policy.backoff_multiplier).min(task.retry_policy.max_delay_ms);
                }
            }
        }

        Ok(None)
    }

    async fn rollback(&self, checkpoint: &Checkpoint) -> Result<()> {
        let completed_tasks = self.state.read().await.get_completed_tasks(&checkpoint.workflow_id)?;

        // Reverse order rollback
        for task in completed_tasks.iter().rev() {
            tracing::info!("Rolling back task: {}", task.id);
            self.executor.rollback(task).await?;
        }

        self.checkpoint_store.delete(&checkpoint.id).await?;
        Ok(())
    }

    fn resolve_dependencies(&self, workflow: &Workflow) -> Result<Vec<Task>> {
        // Topological sort
        let mut in_degree: HashMap<String, usize> = HashMap::new();
        let mut graph: HashMap<String, Vec<String>> = HashMap::new();

        for task in &workflow.tasks {
            in_degree.insert(task.id.clone(), 0);
            graph.insert(task.id.clone(), vec![]);
        }

        for (task_id, deps) in &workflow.dependencies {
            for dep in deps {
                graph.get_mut(dep).unwrap().push(task_id.clone());
                *in_degree.get_mut(task_id).unwrap() += 1;
            }
        }

        let mut queue: Vec<String> = in_degree
            .iter()
            .filter( | (_, &degree) |  degree == 0)
            .map( | (id, _) |  id.clone())
            .collect();

        let mut sorted = Vec::new();

        while let Some(task_id) = queue.pop() {
            sorted.push(task_id.clone());

            for neighbor in &graph[&task_id] {
                *in_degree.get_mut(neighbor).unwrap() -= 1;
                if in_degree[neighbor] == 0 {
                    queue.push(neighbor.clone());
                }
            }
        }

        if sorted.len() != workflow.tasks.len() {
            anyhow::bail!("Cyclic dependency detected");
        }

        Ok(sorted.into_iter().map( | id |  workflow.tasks.iter().find( | t| t.id == id).unwrap().clone()).collect())
    }
}

#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct RetryPolicy {
    pub max_retries: u32,
    pub initial_delay_ms: u64,
    pub max_delay_ms: u64,
    pub backoff_multiplier: u64,
}

impl Default for RetryPolicy {
    fn default() -> Self {
        Self {
            max_retries: 3,
            initial_delay_ms: 1000,
            max_delay_ms: 60000,
            backoff_multiplier: 2,
        }
    }
}

MCP Server Tools (Provisioning)

// platform/mcp-server/src/tools.rs
use serde_json::json;

pub const MCP_TOOLS: &[Tool] = &[
    Tool {
        name: "query_infrastructure",
        description: "Query infrastructure state using natural language (RAG-powered)",
        parameters: json!({
            "query": { "type": "string", "description": "Natural language query" },
            "provider": { "type": "string", "optional": true, "enum": ["aws", "upcloud", "local"] }
        }),
    },
    Tool {
        name: "generate_config",
        description: "Generate Nickel configuration from natural language description",
        parameters: json!({
            "description": { "type": "string", "description": "Infrastructure description" },
            "provider": { "type": "string", "enum": ["aws", "upcloud", "local"] },
            "resource_type": { "type": "string", "enum": ["server", "network", "cluster", "database"] }
        }),
    },
    Tool {
        name: "validate_config",
        description: "Validate Nickel configuration against schemas",
        parameters: json!({
            "config": { "type": "string", "description": "Nickel configuration code" },
            "strict": { "type": "boolean", "default": true, "description": "Strict validation mode" }
        }),
    },
    Tool {
        name: "estimate_cost",
        description: "Estimate monthly cost for infrastructure configuration",
        parameters: json!({
            "config": { "type": "string", "description": "Nickel configuration" },
            "region": { "type": "string", "optional": true }
        }),
    },
    Tool {
        name: "check_compliance",
        description: "Check configuration against compliance frameworks",
        parameters: json!({
            "config": { "type": "string" },
            "framework": { "type": "string", "enum": ["soc2", "hipaa", "gdpr", "pci"] }
        }),
    },
    Tool {
        name: "plan_migration",
        description: "Generate migration plan between configurations",
        parameters: json!({
            "current": { "type": "string", "description": "Current Nickel config" },
            "target": { "type": "string", "description": "Target Nickel config" }
        }),
    },
    Tool {
        name: "execute_workflow",
        description: "Execute provisioning workflow with rollback support",
        parameters: json!({
            "workflow_id": { "type": "string" },
            "dry_run": { "type": "boolean", "default": true }
        }),
    },
];

CLI Shortcuts (80+)

# Core operations
prov init                  # Initialize provisioning workspace
prov plan <config.ncl>     # Generate execution plan (dry-run)
prov apply <config.ncl>    # Apply configuration with rollback
prov destroy <config.ncl>  # Destroy infrastructure
prov state list            # List resources in state
prov state show <id>       # Show resource details

# Provider management
prov provider add aws      # Add AWS provider credentials
prov provider add upcloud  # Add UpCloud provider credentials
prov provider list         # List configured providers
prov provider test <name>  # Test provider connectivity

# Service installation (taskservs)
prov service install containerd --servers server-01,server-02
prov service install kubernetes --cluster k8s-prod
prov service install cilium --cluster k8s-prod --version 1.14
prov service list          # List available services
prov service status <name> # Check service status

# Cluster operations
prov cluster create k8s-ha --template extensions/clusters/k8s-ha/
prov cluster scale k8s-prod --workers 10
prov cluster upgrade k8s-prod --version 1.28
prov cluster backup k8s-prod --output /backups/
prov cluster restore k8s-prod --from /backups/2026-01-22/

# Workflow operations
prov workflow run backup --cluster k8s-prod
prov workflow run monitoring --cluster k8s-prod
prov workflow list         # List available workflows
prov workflow status <id>  # Check workflow status

# Guided wizards
prov wizard cluster        # Interactive cluster setup
prov wizard database       # Interactive database setup
prov wizard monitoring     # Interactive monitoring setup

# AI-assisted operations (MCP)
prov mcp query "Show me all AWS servers in us-east-1"
prov mcp generate "Create a 3-node K8s cluster with Cilium on UpCloud"
prov mcp validate cluster.ncl
prov mcp estimate cluster.ncl

# Security operations
prov vault init            # Initialize SecretumVault integration
prov vault store secret/myapp key=value
prov vault read secret/myapp
prov cert generate --domain example.com --engine pki
prov cert rotate --cluster k8s-prod

# Observability
prov logs <resource-id>    # View resource logs
prov metrics <cluster>     # View cluster metrics
prov health <cluster>      # Health check
prov events <cluster>      # View events

# Configuration management
prov config get <key>      # Get config value (476+ accessors)
prov config set <key> <value>
prov config list           # List all configuration
prov config validate       # Validate configuration

2. SecretumVault: Especificaciones Ops

Architecture Overview

┌─────────────────────────────────────────────────────────────────┐
│  SecretumVault (~11K LOC, 50+ tests)                            │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  ┌─────────────┐  ┌─────────────┐  ┌─────────────────────────┐  │
│  │  CLI        │  │  REST API   │  │      Secrets Engines    │  │
│  │  (svault)   │  │  (Axum)     │  │  KV/Transit/PKI/DB      │  │
│  └──────┬──────┘  └──────┬──────┘  └────────────┬────────────┘  │
│         │                │                      │               │
│  ┌──────┴────────────────┴──────────────────────┴─────────────┐ │
│  │                    VaultCore                               │ │
│  │  Seal (Shamir) │ TokenManager │ Cedar ABAC │ Metrics       │ │
│  └────────────────────────────────────────────────────────────┘ │
│                          │                                      │
│  ┌───────────────────────┴───────────────────────────────────┐  │
│  │  Crypto Backends (Pluggable)                              │  │
│  │  OpenSSL │ OQS (PQC) │ AWS-LC │ RustCrypto                │  │
│  └───────────────────────────────────────────────────────────┘  │
│                          │                                      │
│  ┌───────────────────────┴───────────────────────────────────┐  │
│  │  Storage Backends (Pluggable)                             │  │
│  │  Filesystem │ etcd │ SurrealDB │ PostgreSQL               │  │
│  └───────────────────────────────────────────────────────────┘  │
└─────────────────────────────────────────────────────────────────┘

Core Types

// src/core/vault.rs
use std::collections::HashMap;
use std::sync::Arc;
use tokio::sync::Mutex;

pub struct VaultCore {
    pub engines: HashMap<String, Box<dyn Engine>>,
    pub storage: Arc<dyn StorageBackend>,
    pub crypto: Arc<dyn CryptoBackend>,
    pub seal: Arc<Mutex<SealMechanism>>,
    pub token_manager: Arc<TokenManager>,
    pub authorizer: Arc<CedarAuthorizer>,
    pub metrics: Arc<Metrics>,
}

impl VaultCore {
    pub async fn init(&self, shares: u8, threshold: u8) -> Result<Vec<SecretShare>> {
        // Initialize Shamir unsealing
        let mut seal = self.seal.lock().await;
        let shares = seal.init(shares, threshold)?;

        // Setup default engines
        self.mount_engine("secret", Box::new(KvEngine::new(self.storage.clone()))).await?;

        Ok(shares)
    }

    pub async fn unseal(&self, share: SecretShare) -> Result<UnsealProgress> {
        let mut seal = self.seal.lock().await;
        let progress = seal.unseal(share)?;

        if let UnsealProgress::Complete = progress {
            tracing::info!("Vault unsealed successfully");
            self.metrics.record_unseal().await;
        }

        Ok(progress)
    }

    pub async fn mount_engine(&self, path: &str, engine: Box<dyn Engine>) -> Result<()> {
        self.engines.insert(path.to_string(), engine);
        tracing::info!("Mounted engine at path: {}", path);
        Ok(())
    }
}

Crypto Backends (Post-Quantum)

// src/crypto/backends/oqs.rs (Post-Quantum)
use oqs::{kem, sig};
use anyhow::Result;

pub struct OqsBackend {
    kem_algorithm: kem::Algorithm,   // MlKem768
    sig_algorithm: sig::Algorithm,   // MlDsa65
    kem_cache: Arc<Mutex<Option<kem::Kem>>>,
    sig_cache: Arc<Mutex<Option<sig::Sig>>>,
}

impl OqsBackend {
    pub fn new() -> Result<Self> {
        Ok(Self {
            kem_algorithm: kem::Algorithm::MlKem768,
            sig_algorithm: sig::Algorithm::MlDsa65,
            kem_cache: Arc::new(Mutex::new(None)),
            sig_cache: Arc::new(Mutex::new(None)),
        })
    }

    pub async fn kem_keypair(&self) -> CryptoResult<KemKeyPair> {
        let mut cache = self.kem_cache.lock().await;

        if cache.is_none() {
            *cache = Some(kem::Kem::new(self.kem_algorithm)?);
        }

        let kem = cache.as_ref().unwrap();
        let (pk, sk) = kem.keypair()?;

        // ML-KEM-768: 1184 bytes public key, 2400 bytes secret key
        Ok(KemKeyPair {
            public_key: pk.into_vec(),
            secret_key: sk.into_vec(),
        })
    }

    pub async fn kem_encapsulate(&self, public_key: &[u8]) -> CryptoResult<KemResult> {
        let mut cache = self.kem_cache.lock().await;

        if cache.is_none() {
            *cache = Some(kem::Kem::new(self.kem_algorithm)?);
        }

        let kem = cache.as_ref().unwrap();
        let pk = kem::PublicKey::from_bytes(public_key)?;
        let (ciphertext, shared_secret) = kem.encapsulate(&pk)?;

        // ML-KEM-768: 1088 bytes ciphertext, 32 bytes shared secret
        Ok(KemResult {
            ciphertext: ciphertext.into_vec(),
            shared_secret: shared_secret.into_vec(),
        })
    }

    pub async fn kem_decapsulate(&self, secret_key: &[u8], ciphertext: &[u8]) -> CryptoResult<Vec<u8>> {
        let mut cache = self.kem_cache.lock().await;

        if cache.is_none() {
            *cache = Some(kem::Kem::new(self.kem_algorithm)?);
        }

        let kem = cache.as_ref().unwrap();
        let sk = kem::SecretKey::from_bytes(secret_key)?;
        let ct = kem::Ciphertext::from_bytes(ciphertext)?;
        let shared_secret = kem.decapsulate(&sk, &ct)?;

        Ok(shared_secret.into_vec())
    }

    pub async fn sign(&self, secret_key: &[u8], message: &[u8]) -> CryptoResult<Vec<u8>> {
        let mut cache = self.sig_cache.lock().await;

        if cache.is_none() {
            *cache = Some(sig::Sig::new(self.sig_algorithm)?);
        }

        let sig_obj = cache.as_ref().unwrap();
        let sk = sig::SecretKey::from_bytes(secret_key)?;
        let signature = sig_obj.sign(message, &sk)?;

        // ML-DSA-65: 3309 bytes signature
        Ok(signature.into_vec())
    }

    pub async fn verify(&self, public_key: &[u8], message: &[u8], signature: &[u8]) -> CryptoResult<bool> {
        let mut cache = self.sig_cache.lock().await;

        if cache.is_none() {
            *cache = Some(sig::Sig::new(self.sig_algorithm)?);
        }

        let sig_obj = cache.as_ref().unwrap();
        let pk = sig::PublicKey::from_bytes(public_key)?;
        let sig_bytes = sig::Signature::from_bytes(signature)?;

        match sig_obj.verify(message, &sig_bytes, &pk) {
            Ok(_) => Ok(true),
            Err(_) => Ok(false),
        }
    }
}

#[async_trait]
impl CryptoBackend for OqsBackend {
    async fn generate_keypair(&self, algorithm: KeyAlgorithm) -> CryptoResult<KeyPair> {
        match algorithm {
            KeyAlgorithm::MlKem768 => {
                let kem_pair = self.kem_keypair().await?;
                Ok(KeyPair {
                    public_key: kem_pair.public_key,
                    private_key: kem_pair.secret_key,
                })
            }
            KeyAlgorithm::MlDsa65 => {
                let mut cache = self.sig_cache.lock().await;
                if cache.is_none() {
                    *cache = Some(sig::Sig::new(self.sig_algorithm)?);
                }
                let sig_obj = cache.as_ref().unwrap();
                let (pk, sk) = sig_obj.keypair()?;
                Ok(KeyPair {
                    public_key: pk.into_vec(),
                    private_key: sk.into_vec(),
                })
            }
            _ => Err(CryptoError::UnsupportedAlgorithm),
        }
    }

    async fn encrypt(&self, plaintext: &[u8]) -> CryptoResult<Vec<u8>> {
        // Generate ephemeral keypair
        let kem_pair = self.kem_keypair().await?;

        // Encapsulate to get shared secret
        let kem_result = self.kem_encapsulate(&kem_pair.public_key).await?;

        // Use shared secret for AES-256-GCM encryption
        let cipher = Aes256Gcm::new_from_slice(&kem_result.shared_secret)?;
        let nonce = Aes256Gcm::generate_nonce(&mut OsRng);
        let ciphertext = cipher.encrypt(&nonce, plaintext)?;

        // Prepend ciphertext with KEM ciphertext and nonce
        let mut result = Vec::new();
        result.extend_from_slice(&kem_result.ciphertext);
        result.extend_from_slice(&nonce);
        result.extend_from_slice(&ciphertext);

        Ok(result)
    }

    async fn decrypt(&self, ciphertext: &[u8]) -> CryptoResult<Vec<u8>> {
        // Extract KEM ciphertext, nonce, and encrypted data
        let kem_ct = &ciphertext[..1088];  // ML-KEM-768 ciphertext size
        let nonce = &ciphertext[1088..1088+12];
        let encrypted = &ciphertext[1088+12..];

        // Decapsulate to get shared secret (requires secret key, stored in vault)
        // This is simplified - in practice, secret key would be retrieved securely

        // Use shared secret for AES-256-GCM decryption
        // ... (implementation details)

        Ok(plaintext)
    }
}

Secrets Engines

// src/engines/mod.rs
#[async_trait]
pub trait Engine: Send + Sync {
    fn name(&self) -> &str;
    fn engine_type(&self) -> &str;
    async fn read(&self, path: &str) -> Result<Option<Value>>;
    async fn write(&self, path: &str, data: &Value) -> Result<()>;
    async fn delete(&self, path: &str) -> Result<()>;
    async fn list(&self, prefix: &str) -> Result<Vec<String>>;
}

// src/engines/kv.rs (Key-Value Engine)
pub struct KvEngine {
    storage: Arc<dyn StorageBackend>,
    max_versions: usize,
}

impl KvEngine {
    pub fn new(storage: Arc<dyn StorageBackend>) -> Self {
        Self {
            storage,
            max_versions: 10,  // Keep 10 versions by default
        }
    }
}

#[async_trait]
impl Engine for KvEngine {
    fn name(&self) -> &str { "kv" }
    fn engine_type(&self) -> &str { "kv-v2" }

    async fn write(&self, path: &str, data: &Value) -> Result<()> {
        let full_path = format!("secret/data/{}", path);

        // Get current version
        let current_version = self.get_current_version(path).await?;
        let new_version = current_version + 1;

        // Create versioned entry
        let entry = VersionedSecret {
            version: new_version,
            created_time: Utc::now(),
            data: data.clone(),
        };

        // Store
        self.storage.store_secret(&full_path, &entry).await?;

        // Cleanup old versions
        self.cleanup_old_versions(path, new_version).await?;

        Ok(())
    }

    async fn read(&self, path: &str) -> Result<Option<Value>> {
        let full_path = format!("secret/data/{}", path);

        match self.storage.get_secret(&full_path).await? {
            Some(entry) => Ok(Some(entry.data)),
            None => Ok(None),
        }
    }
}

// src/engines/database.rs (Dynamic Credentials)
pub struct DatabaseEngine {
    storage: Arc<dyn StorageBackend>,
    connections: Arc<RwLock<HashMap<String, DatabaseConnection>>>,
}

#[async_trait]
impl Engine for DatabaseEngine {
    fn name(&self) -> &str { "database" }
    fn engine_type(&self) -> &str { "database" }

    async fn write(&self, path: &str, data: &Value) -> Result<()> {
        // Configure database connection
        let config: DatabaseConfig = serde_json::from_value(data.clone())?;

        let connection = DatabaseConnection::new(&config).await?;
        self.connections.write().await.insert(path.to_string(), connection);

        Ok(())
    }

    async fn read(&self, path: &str) -> Result<Option<Value>> {
        // Generate dynamic credentials
        // path format: "database/creds/{role}"

        if !path.starts_with("creds/") {
            return Ok(None);
        }

        let role = path.strip_prefix("creds/").unwrap();
        let role_config: RoleConfig = self.get_role_config(role).await?;

        // Generate username/password
        let username = format!("v-{}-{}", role, Uuid::new_v4());
        let password = generate_secure_password(32);

        // Create user in database
        let connections = self.connections.read().await;
        let db_conn = connections.get(&role_config.db_name)
            .ok_or_else(|| anyhow!("Database connection not found"))?;

        db_conn.execute(&role_config.creation_statements
            .replace("{{name}}", &username)
            .replace("{{password}}", &password)
            .replace("{{expiration}}", &format_expiration(role_config.default_ttl))
        ).await?;

        // Create lease for cleanup
        let lease_id = format!("database/creds/{}/{}", role, Uuid::new_v4());
        self.create_lease(&lease_id, role_config.default_ttl, move |vault |  {
            // Revoke credentials on lease expiration
            async move {
                db_conn.execute(&format!("DROP USER '{}'", username)).await?;
                Ok(())
            }
        }).await?;

        Ok(Some(json!({
            "lease_id": lease_id,
            "lease_duration": role_config.default_ttl.as_secs(),
            "username": username,
            "password": password,
        })))
    }
}

Configuration (TOML)

# svault.toml
[vault]
# Crypto backend: openssl | oqs | aws-lc | rustcrypto
crypto_backend = "oqs"  # Post-quantum by default

[server]
address = "0.0.0.0:8200"
tls_cert = "/etc/svault/certs/server.pem"
tls_key = "/etc/svault/certs/server-key.pem"
# Client certificate verification (mTLS)
client_ca = "/etc/svault/certs/ca.pem"
require_client_cert = false

[storage]
# Backend: filesystem | etcd | surrealdb | postgresql
backend = "etcd"

[storage.etcd]
endpoints = ["http://etcd-01:2379", "http://etcd-02:2379", "http://etcd-03:2379"]
username = "svault"
password = "secret"
# TLS for etcd
ca_cert = "/etc/svault/etcd-ca.pem"
client_cert = "/etc/svault/etcd-client.pem"
client_key = "/etc/svault/etcd-client-key.pem"

[seal.shamir]
# Shamir secret sharing configuration
shares = 5
threshold = 3

[auth]
# Token TTL
token_ttl = "24h"
token_max_ttl = "720h"  # 30 days

[audit]
# Audit log retention
enabled = true
retention_days = 2555  # 7 years
backend = "file"
path = "/var/log/svault/audit.log"

[engines]
# Default engines to mount on init
kv = { path = "secret", version = 2 }
transit = { path = "transit" }
pki = { path = "pki", max_lease_ttl = "87600h" }  # 10 years
database = { path = "database" }

[metrics]
# Prometheus metrics
enabled = true
address = "0.0.0.0:9090"

3. Vapora: Especificaciones Ops Agents

Agent Roles for Ops

// crates/vapora-agents/src/roles.rs
#[derive(Debug, Clone, Serialize, Deserialize)]
pub enum AgentRole {
    // Development roles
    Architect,
    Developer,
    CodeReviewer,
    Tester,
    Documenter,

    // Marketing/Communication
    Marketer,
    Presenter,

    // Ops/DevOps roles
    DevOps,         // CI/CD, deployment, automation
    Monitor,        // Health checks, alerting, metrics
    Security,       // Vulnerability scanning, compliance

    // Management
    ProjectManager,
    DecisionMaker,
}

impl AgentRole {
    pub fn default_provider(&self) -> LLMProvider {
        match self {
            // High-complexity ops tasks: Claude Opus
            AgentRole::Security => LLMProvider::Claude { model: "claude-opus-4-20250514" },
            AgentRole::DecisionMaker => LLMProvider::Claude { model: "claude-opus-4-20250514" },

            // Standard ops tasks: Claude Sonnet
            AgentRole::DevOps => LLMProvider::Claude { model: "claude-sonnet-4-20250514" },
            AgentRole::ProjectManager => LLMProvider::Claude { model: "claude-sonnet-4-20250514" },

            // Real-time monitoring: Gemini Flash (low latency)
            AgentRole::Monitor => LLMProvider::Gemini { model: "gemini-2.0-flash-exp" },

            _ => LLMProvider::Claude { model: "claude-sonnet-4-20250514" },
        }
    }

    pub fn can_block_pipeline(&self) -> bool {
        matches!(self, AgentRole::Security)
    }

    pub fn requires_approval(&self) -> bool {
        matches!(self, AgentRole::DevOps | AgentRole::Security)
    }
}

NATS Message Patterns (Ops)

// crates/vapora-agents/src/messages.rs
#[derive(Debug, Clone, Serialize, Deserialize)]
pub enum AgentMessage {
    TaskAssignment {
        task_id: String,
        agent_id: String,
        agent_role: AgentRole,
        task_type: String,
        payload: serde_json::Value,
        priority: Priority,
    },
    TaskResult {
        task_id: String,
        agent_id: String,
        agent_role: AgentRole,
        status: TaskStatus,
        output: Option<String>,
        duration_ms: u64,
        tokens_used: u32,
        cost_cents: f64,
    },
    Heartbeat {
        agent_id: String,
        agent_role: AgentRole,
        status: AgentStatus,
        current_load: f64,
        last_task_completed_at: Option<DateTime<Utc>>,
    },
    Alert {
        severity: AlertSeverity,
        source: String,
        message: String,
        metadata: serde_json::Value,
    },
    ApprovalRequest {
        task_id: String,
        requester: AgentRole,
        action: String,
        details: serde_json::Value,
    },
    ApprovalResponse {
        task_id: String,
        approved: bool,
        approver: String,
        reason: Option<String>,
    },
}

// NATS subjects
pub const TASK_ASSIGNMENT: &str = "vapora.tasks.assign";
pub const TASK_RESULTS: &str = "vapora.tasks.results";
pub const AGENT_HEARTBEAT: &str = "vapora.agents.heartbeat";
pub const ALERTS: &str = "vapora.alerts";
pub const APPROVALS_REQUEST: &str = "vapora.approvals.request";
pub const APPROVALS_RESPONSE: &str = "vapora.approvals.response";

#[derive(Debug, Clone, Serialize, Deserialize)]
pub enum AlertSeverity {
    Info,
    Warning,
    Error,
    Critical,
}

#[derive(Debug, Clone, Serialize, Deserialize)]
pub enum Priority {
    Low = 1,
    Normal = 2,
    High = 3,
    Critical = 4,
}

Budget Control (Ops Agents)

// crates/vapora-llm-router/src/budget.rs
use std::collections::HashMap;
use serde::{Deserialize, Serialize};
use chrono::{DateTime, Utc};

#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct BudgetConfig {
    pub role: AgentRole,
    pub monthly_limit_cents: u32,
    pub weekly_limit_cents: Option<u32>,
    pub enforcement: BudgetEnforcement,
    pub fallback_chain: Vec<LLMProvider>,
}

#[derive(Debug, Clone, Serialize, Deserialize)]
pub enum BudgetEnforcement {
    Normal,           // Under 80% of limit
    NearThreshold,    // 80-100% of limit, use fallback
    Exceeded,         // Over limit, block or use cheapest fallback only
}

pub struct BudgetTracker {
    configs: HashMap<AgentRole, BudgetConfig>,
    usage: Arc<RwLock<HashMap<AgentRole, UsageStats>>>,
}

impl BudgetTracker {
    pub async fn check_budget(&self, role: AgentRole, estimated_cost_cents: f64) -> BudgetEnforcement {
        let config = self.configs.get(&role).unwrap();
        let usage = self.usage.read().await;
        let stats = usage.get(&role).unwrap_or(&UsageStats::default());

        let monthly_usage = stats.monthly_cost_cents;
        let weekly_usage = stats.weekly_cost_cents;

        // Check weekly limit first (if set)
        if let Some(weekly_limit) = config.weekly_limit_cents {
            if weekly_usage + estimated_cost_cents > weekly_limit as f64 {
                return BudgetEnforcement::Exceeded;
            } else if weekly_usage + estimated_cost_cents > (weekly_limit as f64 * 0.8) {
                return BudgetEnforcement::NearThreshold;
            }
        }

        // Check monthly limit
        if monthly_usage + estimated_cost_cents > config.monthly_limit_cents as f64 {
            BudgetEnforcement::Exceeded
        } else if monthly_usage + estimated_cost_cents > (config.monthly_limit_cents as f64 * 0.8) {
            BudgetEnforcement::NearThreshold
        } else {
            BudgetEnforcement::Normal
        }
    }

    pub async fn select_provider(&self, role: AgentRole, task_type: &str) -> LLMProvider {
        let enforcement = self.check_budget(role, self.estimate_cost(task_type)).await;
        let config = self.configs.get(&role).unwrap();

        match enforcement {
            BudgetEnforcement::Normal => {
                // Use default provider for role
                role.default_provider()
            }
            BudgetEnforcement::NearThreshold => {
                // Use first fallback (cheaper)
                config.fallback_chain.get(0)
                    .cloned()
                    .unwrap_or_else(|| role.default_provider())
            }
            BudgetEnforcement::Exceeded => {
                // Use cheapest fallback (typically Ollama local)
                config.fallback_chain.last()
                    .cloned()
                    .unwrap_or_else(|| LLMProvider::Ollama { model: "llama3.1:8b" })
            }
        }
    }

    pub async fn record_usage(&self, role: AgentRole, cost_cents: f64) {
        let mut usage = self.usage.write().await;
        let stats = usage.entry(role).or_insert_with(UsageStats::default);

        stats.monthly_cost_cents += cost_cents;
        stats.weekly_cost_cents += cost_cents;
        stats.total_requests += 1;
        stats.last_updated = Utc::now();
    }
}

#[derive(Debug, Clone, Serialize, Deserialize, Default)]
pub struct UsageStats {
    pub monthly_cost_cents: f64,
    pub weekly_cost_cents: f64,
    pub total_requests: u64,
    pub last_updated: DateTime<Utc>,
}

Prometheus Metrics (Ops)

// crates/vapora-telemetry/src/metrics.rs
use prometheus::{Encoder, Gauge, Counter, Histogram, Registry};

pub struct VaporaMetrics {
    registry: Registry,

    // Budget metrics
    budget_utilization: Gauge,
    budget_exceeded_total: Counter,
    fallback_triggers_total: Counter,

    // Agent metrics
    active_agents: Gauge,
    task_duration_seconds: Histogram,
    task_status_total: Counter,

    // Cost metrics
    llm_cost_cents_total: Counter,
    tokens_used_total: Counter,
}

impl VaporaMetrics {
    pub fn new() -> Self {
        let registry = Registry::new();

        let budget_utilization = Gauge::new(
            "vapora_budget_utilization_ratio",
            "Budget utilization ratio (0.0-1.0) per agent role"
        ).unwrap();

        let budget_exceeded_total = Counter::new(
            "vapora_budget_exceeded_total",
            "Total number of budget exceeded events per agent role"
        ).unwrap();

        let fallback_triggers_total = Counter::new(
            "vapora_fallback_triggers_total",
            "Total number of fallback provider triggers due to budget"
        ).unwrap();

        let active_agents = Gauge::new(
            "vapora_active_agents",
            "Number of active agents by role and status"
        ).unwrap();

        let task_duration_seconds = Histogram::new(
            "vapora_task_duration_seconds",
            "Task execution duration in seconds"
        ).unwrap();

        let task_status_total = Counter::new(
            "vapora_task_status_total",
            "Total tasks by status (success, failed, timeout)"
        ).unwrap();

        let llm_cost_cents_total = Counter::new(
            "vapora_llm_cost_cents_total",
            "Total LLM cost in cents per provider and role"
        ).unwrap();

        let tokens_used_total = Counter::new(
            "vapora_tokens_used_total",
            "Total tokens used per provider and role"
        ).unwrap();

        registry.register(Box::new(budget_utilization.clone())).unwrap();
        registry.register(Box::new(budget_exceeded_total.clone())).unwrap();
        registry.register(Box::new(fallback_triggers_total.clone())).unwrap();
        registry.register(Box::new(active_agents.clone())).unwrap();
        registry.register(Box::new(task_duration_seconds.clone())).unwrap();
        registry.register(Box::new(task_status_total.clone())).unwrap();
        registry.register(Box::new(llm_cost_cents_total.clone())).unwrap();
        registry.register(Box::new(tokens_used_total.clone())).unwrap();

        Self {
            registry,
            budget_utilization,
            budget_exceeded_total,
            fallback_triggers_total,
            active_agents,
            task_duration_seconds,
            task_status_total,
            llm_cost_cents_total,
            tokens_used_total,
        }
    }

    pub fn export(&self) -> String {
        let encoder = prometheus::TextEncoder::new();
        let metric_families = self.registry.gather();
        let mut buffer = Vec::new();
        encoder.encode(&metric_families, &mut buffer).unwrap();
        String::from_utf8(buffer).unwrap()
    }
}

4. TypeDialog (prov-gen): Especificaciones IaC Generation

Prov-Gen Backend

// crates/typedialog-prov-gen/src/lib.rs
use tera::{Tera, Context};
use serde::{Deserialize, Serialize};

pub struct ProvGenBackend {
    templates: Tera,
    validators: Vec<Box<dyn Validator>>,
}

#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct InfrastructureConfig {
    pub provider: CloudProvider,
    pub region: String,
    pub resources: Vec<Resource>,
    pub networking: NetworkConfig,
    pub security: SecurityConfig,
}

#[derive(Debug, Clone, Serialize, Deserialize)]
pub enum CloudProvider {
    Aws,
    Gcp,
    Azure,
    Hetzner,
    UpCloud,
    Local,  // LXD
}

pub struct Generator {
    templates: tera::Tera,
    validators: Vec<Box<dyn Validator>>,
}

impl Generator {
    pub async fn generate(&self, config: &InfrastructureConfig) -> Result<GeneratedIaC> {
        // 1. Validate input config (7-layer validation)
        self.validate_config(config)?;

        // 2. Load provider-specific template
        let template_name = format!("{}.ncl.tera", config.provider.as_str());
        let template = self.templates.get_template(&template_name)?;

        // 3. Create template context
        let mut context = Context::new();
        context.insert("provider", &config.provider);
        context.insert("region", &config.region);
        context.insert("resources", &config.resources);
        context.insert("networking", &config.networking);
        context.insert("security", &config.security);

        // 4. Render Nickel configuration
        let nickel_code = template.render(&context)?;

        // 5. Validate generated Nickel
        self.validate_nickel(&nickel_code)?;

        // 6. Split into logical files
        let files = self.split_to_files(&nickel_code)?;

        Ok(GeneratedIaC {
            provider: config.provider.clone(),
            main_file: nickel_code,
            files,
            validation_passed: true,
        })
    }

    fn validate_config(&self, config: &InfrastructureConfig) -> Result<()> {
        for validator in &self.validators {
            validator.validate(config)?;
        }
        Ok(())
    }

    fn validate_nickel(&self, code: &str) -> Result<()> {
        // Run nickel typecheck
        let output = std::process::Command::new("nickel")
            .arg("typecheck")
            .arg("--stdin")
            .stdin(std::process::Stdio::piped())
            .stdout(std::process::Stdio::piped())
            .stderr(std::process::Stdio::piped())
            .spawn()?
            .stdin.unwrap().write_all(code.as_bytes())?;

        // Check exit status
        // ... (implementation details)

        Ok(())
    }
}

Templates (Tera + Nickel)

{# templates/aws.ncl.tera - AWS multi-cloud template #}
{# Generated by TypeDialog prov-gen backend #}

{
  provider = "aws",
  region = "{{ region }}",

  {% if resources.servers %}
  servers = [
    {% for server in resources.servers %}
    {
      name = "{{ server.name }}",
      plan = "{{ server.plan }}",
      role = {% if server.role %}"{{ server.role }}"{% else %}null{% endif %},
      provider = "aws",

      spec = {
        cpu = {{ server.cpu | default(value=2) }},
        memory_gb = {{ server.memory_gb | default(value=4) }},
        disk_gb = {{ server.disk_gb | default(value=50) }},

        os = {
          family = 'ubuntu,
          version = "{{ server.os_version | default(value='22.04') }}",
        },
      },

      networking = {
        vpc = "{{ networking.vpc_id }}",
        subnet = "{{ networking.subnet_id }}",
        public_ip = {{ server.public_ip | default(value=false) }},
        security_groups = {{ server.security_groups | default(value=[]) | json_encode }},
      },

      tags = {
        Environment = "{{ environment | default(value='production') }}",
        ManagedBy = "provisioning",
        {% for key, value in server.tags %}
        "{{ key }}" = "{{ value }}",
        {% endfor %}
      },
    },
    {% endfor %}
  ],
  {% endif %}

  {% if resources.taskservs %}
  taskservs = {{ resources.taskservs | json_encode }},
  {% endif %}

  networking = {
    vpc_cidr = "{{ networking.vpc_cidr | default(value='10.0.0.0/16') }}",
    {% if networking.pod_cidr %}
    pod_cidr = "{{ networking.pod_cidr }}",
    service_cidr = "{{ networking.service_cidr }}",
    {% endif %}
  },

  {% if security %}
  security = {
    {% if security.enable_encryption %}
    encryption_at_rest = true,
    kms_key_id = "{{ security.kms_key_id }}",
    {% endif %}

    {% if security.enable_audit_logging %}
    audit_logging = {
      enabled = true,
      retention_days = {{ security.audit_retention_days | default(value=2555) }},
    },
    {% endif %}
  },
  {% endif %}
}

5. Kogral: Especificaciones Knowledge Management

Node Types (Ops Focus)

// kogral-core/src/models.rs
#[derive(Debug, Clone, Serialize, Deserialize)]
pub enum NodeType {
    Note,        // General notes, documentation
    Decision,    // ADRs (Architectural Decision Records)
    Guideline,   // Team/org standards, policies
    Pattern,     // Reusable solutions, best practices
    Journal,     // Daily development/ops log
    Execution,   // Agent execution records, postmortems, incidents
}

#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct Node {
    pub id: String,
    pub node_type: NodeType,
    pub title: String,
    pub content: String,              // Markdown body
    pub metadata: HashMap<String, String>,
    pub tags: Vec<String>,
    pub created_at: DateTime<Utc>,
    pub updated_at: DateTime<Utc>,
    pub author: Option<String>,
}

// Example: Execution node for incident postmortem
impl Node {
    pub fn new_execution(title: &str, incident_details: IncidentDetails) -> Self {
        let content = format!(
            "# {}\n\n\
            ## Timeline\n\
            - **Started**: {}\n\
            - **Detected**: {}\n\
            - **Resolved**: {}\n\
            - **Duration**: {:?}\n\n\
            ## Root Cause\n\
            {}\n\n\
            ## Resolution\n\
            {}\n\n\
            ## Action Items\n\
            {}\n\n\
            ## Related Resources\n\
            {}",
            title,
            incident_details.started_at,
            incident_details.detected_at,
            incident_details.resolved_at,
            incident_details.duration,
            incident_details.root_cause,
            incident_details.resolution,
            incident_details.action_items.join("\n"),
            incident_details.related_resources.join("\n"),
        );

        Self {
            id: Uuid::new_v4().to_string(),
            node_type: NodeType::Execution,
            title: title.to_string(),
            content,
            metadata: incident_details.metadata,
            tags: incident_details.tags,
            created_at: Utc::now(),
            updated_at: Utc::now(),
            author: Some(incident_details.author),
        }
    }
}

#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct IncidentDetails {
    pub started_at: DateTime<Utc>,
    pub detected_at: DateTime<Utc>,
    pub resolved_at: DateTime<Utc>,
    pub duration: std::time::Duration,
    pub root_cause: String,
    pub resolution: String,
    pub action_items: Vec<String>,
    pub related_resources: Vec<String>,
    pub metadata: HashMap<String, String>,
    pub tags: Vec<String>,
    pub author: String,
}

MCP Tools (Ops Workflows)

# Buscar runbooks de troubleshooting
kogral-mcp search "nginx 502 error troubleshooting" --type note

# Añadir postmortem de incidente
kogral-mcp add-execution \
  --title "2026-01-22 PostgreSQL Connection Pool Exhaustion" \
  --context "Production database connections maxed out at 100/100" \
  --root-cause "Connection leak in application code, connections not released" \
  --resolution "Increased max_connections from 100 to 200, added PgBouncer pooler, fixed connection leak" \
  --action-items "Implement connection pool monitoring, add alerts at 80% utilization" \
  --tags "database,incident,postgresql,production"

# Obtener guidelines de deployment
kogral-mcp get-guidelines "kubernetes deployment" --include-shared true

# Crear ADR de decisión de infraestructura
kogral-mcp add-decision \
  --title "Choose Cilium over Calico for CNI" \
  --context "Need Kubernetes CNI with eBPF support and service mesh capabilities" \
  --decision "Selected Cilium for better performance (eBPF) and built-in service mesh" \
  --consequences "Higher complexity initially, better performance long-term, requires Linux kernel 4.9+"

# Listar todos los postmortems (Execution nodes)
kogral-mcp list --type execution --tags "incident"

# Exportar knowledge graph a markdown
kogral-mcp export --format markdown --output /docs/ops-knowledge/

6. Integración entre Proyectos (Ops Stack)

Diagrama de Flujo de Datos

                    ┌───────────────────┐
                    │      Kogral       │
                    │ (Runbooks, ADRs)  │
                    └─────────┬─────────┘
                              │
             MCP (operational knowledge)
                              │
    ┌─────────────────────────┼─────────────────────────┐
    │                         │                         │
    ▼                         ▼                         ▼
┌─────────────┐  ┌─────────────────┐  ┌─────────────────┐
│ TypeDialog  │  │     Vapora      │  │  Provisioning   │
│ (Wizards)   │  │  (Ops Agents)   │  │   (IaC Deploy)  │
└──────┬──────┘  └────────┬────────┘  └────────┬────────┘
       │                  │                    │
       │    ┌─────────────┴─────────────┐      │
       │    │                           │      │
       ▼    ▼                           ▼      ▼
  ┌───────────────────────────────────────────────┐
  │             SECRETUMVAULT                     │
  │  PKI certs │ DB creds │ API keys │ Encryption │
  └───────────────────────────────────────────────┘
                         │
                         ▼
  ┌───────────────────────────────────────────────┐
  │          PERSISTENCE LAYER                    │
  │  SurrealDB │ NATS JetStream │ etcd │ Git      │
  └───────────────────────────────────────────────┘

Shared Dependencies (Ops Stack)

# Dependencias comunes (Cargo.toml)
[dependencies]
# Runtime
tokio = { version = "1.48", features = ["full"] }

# Serialization
serde = { version = "1.0", features = ["derive"] }
serde_json = "1.0"
toml = "0.8"

# Database
surrealdb = "2.3"
etcd-client = "0.14"

# Web/API
axum = { version = "0.8", features = ["macros"] }
tower = "0.5"
tower-http = { version = "0.6", features = ["cors", "compression-gzip"] }

# Config
nickel-lang-core = "1.15"

# Logging/Tracing
tracing = "0.1"
tracing-subscriber = { version = "0.3", features = ["env-filter", "json"] }
tracing-opentelemetry = "0.27"

# Metrics
prometheus = "0.13"

# Security
cedar-policy = "4.3"
jsonwebtoken = "9.3"

# Crypto
openssl = { version = "0.10", optional = true }
oqs = { version = "0.10", optional = true }  # Post-Quantum

# Error handling
anyhow = "1.0"
thiserror = "2.0"

Ejemplo de Integración: CI/CD Pipeline con IA

// Ejemplo de pipeline Ops integrado
use vapora_client::VaporaClient;
use provisioning_client::ProvisioningClient;
use secretumvault_client::VaultClient;
use kogral_mcp::KogralMcpClient;

async fn deploy_microservice(config: DeploymentConfig) -> Result<DeploymentResult> {
    // 1. Kogral: Obtener guidelines de deployment
    let kogral = KogralMcpClient::connect("http://localhost:3100").await?;
    let guidelines = kogral.call("get_guidelines", json!({
        "topic": "kubernetes deployment",
        "include_shared": true
    })).await?;

    // 2. Vapora: Orquestar pipeline con agentes
    let vapora = VaporaClient::new("http://localhost:8001");

    // Security Agent: Vulnerability scan
    let scan_task = vapora.create_task(TaskRequest {
        title: "Scan Docker image",
        task_type: "security_scan",
        context: json!({
            "image": config.docker_image,
            "guidelines": guidelines,
        }),
    }).await?;
    vapora.assign_task(&scan_task.id, AgentRole::Security).await?;
    let scan_result = vapora.wait_for_completion(&scan_task.id).await?;

    if !scan_result.passed {
        return Err(anyhow!("Security scan failed: {}", scan_result.issues));
    }

    // DevOps Agent: Validate manifests
    let validate_task = vapora.create_task(TaskRequest {
        title: "Validate K8s manifests",
        task_type: "manifest_validation",
        context: json!({
            "manifests_path": config.manifests_path,
            "cluster": config.cluster,
        }),
    }).await?;
    vapora.assign_task(&validate_task.id, AgentRole::DevOps).await?;
    vapora.wait_for_completion(&validate_task.id).await?;

    // 3. SecretumVault: Obtener secretos para deployment
    let vault = VaultClient::new("http://localhost:8200", &config.vault_token);

    // Database credentials (dynamic)
    let db_creds = vault.read("database/creds/myapp-role").await?;

    // API keys (KV engine)
    let api_keys = vault.read("secret/data/myapp/api-keys").await?;

    // 4. Provisioning: Deploy con kubectl
    let prov = ProvisioningClient::new("http://localhost:8002");
    let deploy_result = prov.execute_workflow(WorkflowRequest {
        workflow_type: "kubernetes_deploy",
        config: json!({
            "cluster": config.cluster,
            "namespace": config.namespace,
            "manifests": config.manifests_path,
            "secrets": {
                "db_username": db_creds["username"],
                "db_password": db_creds["password"],
                "api_key": api_keys["data"]["api_key"],
            },
        }),
    }).await?;

    // 5. Vapora Monitor Agent: Setup health checks
    let monitor_task = vapora.create_task(TaskRequest {
        title: "Setup Prometheus alerts",
        task_type: "monitoring_setup",
        context: json!({
            "service": config.service_name,
            "namespace": config.namespace,
            "endpoints": config.health_endpoints,
        }),
    }).await?;
    vapora.assign_task(&monitor_task.id, AgentRole::Monitor).await?;

    // 6. Kogral: Documentar deployment
    kogral.call("add_execution", json!({
        "title": format!("Deploy {} v{}", config.service_name, config.version),
        "context": format!("Deployed to {}/{}", config.cluster, config.namespace),
        "resolution": "Deployment successful",
        "metadata": {
            "service": config.service_name,
            "version": config.version,
            "cluster": config.cluster,
            "namespace": config.namespace,
            "commit_sha": config.commit_sha,
        },
        "tags": vec!["deployment", "kubernetes", config.service_name.as_str()],
    })).await?;

    Ok(DeploymentResult {
        status: "success",
        deployed_at: Utc::now(),
        health_checks_passed: true,
    })
}

7. Métricas de Calidad (Ops Perspective)

Proyecto	Tests	Cobertura	Clippy	Performance
Provisioning	218	~65%	0 warnings	Rust orchestrator 10-50x Python
SecretumVault	50+	~75%	0 warnings	Crypto ops <10ms (classical), <20ms (PQC)
Vapora	218	~70%	0 warnings	NATS latency <5ms, task assignment <100ms
TypeDialog	3,818	~85%	0 warnings	Form validation <1ms, IaC gen <500ms
Kogral	56	~80%	0 warnings	Semantic search <200ms (fastembed local)

Comandos de Verificación Ops

# Provisioning
cd provisioning
cargo clippy --all-targets --all-features -- -D warnings
cargo test --workspace
just ci-test  # Run CI tests locally

# SecretumVault
cd secretumvault
cargo test --all-features
cargo bench  # Crypto benchmarks

# Vapora
cd vapora
cargo test --workspace
docker-compose up -d  # Integration tests con NATS + SurrealDB

# TypeDialog
cd typedialog
cargo test --workspace --all-features
cargo run --example prov-gen  # Test IaC generation

# Kogral
cd kogral
cargo test
kogral serve &  # Start MCP server
curl http://localhost:3100/health  # Health check

Documento generado: 2026-01-22 Tipo: info (especificaciones técnicas Ops/DevOps)

62 KiB Raw Blame History

Portfolio Ops/DevOps: Especificaciones Técnicas

Arquitectura del Ecosistema Ops

1. Provisioning: Especificaciones Ops

Directory Structure

Nickel IaC Schema Examples

Server Schema

Kubernetes Cluster Schema

Orchestrator API (Rust)

MCP Server Tools (Provisioning)

CLI Shortcuts (80+)

2. SecretumVault: Especificaciones Ops

Architecture Overview

Core Types

Crypto Backends (Post-Quantum)

Secrets Engines

Configuration (TOML)

3. Vapora: Especificaciones Ops Agents

Agent Roles for Ops

NATS Message Patterns (Ops)

Budget Control (Ops Agents)

Prometheus Metrics (Ops)

4. TypeDialog (prov-gen): Especificaciones IaC Generation

Prov-Gen Backend

Templates (Tera + Nickel)

5. Kogral: Especificaciones Knowledge Management

Node Types (Ops Focus)

MCP Tools (Ops Workflows)

6. Integración entre Proyectos (Ops Stack)

Diagrama de Flujo de Datos

Shared Dependencies (Ops Stack)

Ejemplo de Integración: CI/CD Pipeline con IA

7. Métricas de Calidad (Ops Perspective)

Comandos de Verificación Ops

62 KiB

Raw Blame History