provisioning/docs/src/architecture/adr/adr-014-secretumvault-integration.md
2026-01-08 21:22:57 +00:00

24 KiB

ADR-014: SecretumVault Integration for Secrets Management

Status

Accepted - 2025-01-08

Context

The provisioning system manages sensitive data across multiple infrastructure layers: cloud provider credentials, database passwords, API keys, SSH keys, encryption keys, and service tokens. The current security architecture (ADR-009) includes SOPS for encrypted config files and Age for key management, but lacks a centralized secrets management solution with dynamic secrets, access control, and audit logging.

Current Secrets Management Challenges

Existing Approach:

  1. SOPS + Age: Static secrets encrypted in config files

    • Good: Version-controlled, gitops-friendly
    • Limited: Static rotation, no audit trail, manual key distribution
  2. Nickel Configuration: Declarative secrets references

    • Good: Type-safe configuration
    • Limited: Cannot generate dynamic secrets, no lifecycle management
  3. Manual Secret Injection: Environment variables, CLI flags

    • Good: Simple for development
    • Limited: No security guarantees, prone to leakage

Problems Without Centralized Secrets Management

Security Issues:

  • No centralized audit trail (who accessed which secret when)
  • No automatic secret rotation policies
  • No fine-grained access control (Cedar policies not enforced on secrets)
  • Secrets scattered across: SOPS files, env vars, config files, K8s secrets
  • No detection of secret sprawl or leaked credentials

Operational Issues:

  • Manual secret rotation (error-prone, often neglected)
  • No secret versioning (cannot rollback to previous credentials)
  • Difficult onboarding (manual key distribution)
  • No dynamic secrets (credentials exist indefinitely)

Compliance Issues:

  • Cannot prove compliance with secret access policies
  • No audit logs for regulatory requirements
  • Cannot enforce secret expiration policies
  • Difficult to demonstrate least-privilege access

Use Cases Requiring Centralized Secrets Management

  1. Dynamic Database Credentials:

    • Generate short-lived DB credentials for applications
    • Automatic rotation based on policies
    • Revocation on application termination
  2. Cloud Provider API Keys:

    • Centralized storage with access control
    • Audit trail of credential usage
    • Automatic rotation schedules
  3. Service-to-Service Authentication:

    • Dynamic tokens for microservices
    • Short-lived certificates for mTLS
    • Automatic renewal before expiration
  4. SSH Key Management:

    • Temporal SSH keys (ADR-009 SSH integration)
    • Centralized certificate authority
    • Audit trail of SSH access
  5. Encryption Key Management:

    • Master encryption keys for data at rest
    • Key rotation and versioning
    • Integration with KMS systems

Requirements for Secrets Management System

  • Dynamic Secrets: Generate credentials on-demand with TTL
  • Access Control: Integration with Cedar authorization policies
  • Audit Logging: Complete trail of secret access and modifications
  • Secret Rotation: Automatic and manual rotation policies
  • Versioning: Track secret versions, enable rollback
  • High Availability: Distributed, fault-tolerant architecture
  • Encryption at Rest: AES-256-GCM for stored secrets
  • API-First: RESTful API for integration
  • Plugin Ecosystem: Extensible backends (AWS, Azure, databases)
  • Open Source: Self-hosted, no vendor lock-in

Decision

Integrate SecretumVault as the centralized secrets management system for the provisioning platform.

Architecture Diagram

┌─────────────────────────────────────────────────────────────┐
│   Provisioning CLI / Orchestrator / Services                │
│                                                             │
│   - Workspace initialization (credentials)                  │
│   - Infrastructure deployment (cloud API keys)              │
│   - Service configuration (database passwords)              │
│   - SSH temporal keys (certificate generation)              │
└────────────┬────────────────────────────────────────────────┘
             │
             ▼
┌─────────────────────────────────────────────────────────────┐
│   SecretumVault Client Library (Rust)                       │
│   (provisioning/core/libs/secretum-client/)                 │
│                                                             │
│   - Authentication (token, mTLS)                            │
│   - Secret CRUD operations                                  │
│   - Dynamic secret generation                               │
│   - Lease renewal and revocation                            │
│   - Policy enforcement                                      │
└────────────┬────────────────────────────────────────────────┘
             │ HTTPS + mTLS
             ▼
┌─────────────────────────────────────────────────────────────┐
│   SecretumVault Server                                      │
│   (Rust-based Vault implementation)                         │
│                                                             │
│   ┌───────────────────────────────────────────────────┐    │
│   │ API Layer (REST + gRPC)                           │    │
│   ├───────────────────────────────────────────────────┤    │
│   │ Authentication & Authorization                    │    │
│   │ - Token auth, mTLS, OIDC integration              │    │
│   │ - Cedar policy enforcement                        │    │
│   ├───────────────────────────────────────────────────┤    │
│   │ Secret Engines                                    │    │
│   │ - KV (key-value v2 with versioning)               │    │
│   │ - Database (dynamic credentials)                  │    │
│   │ - SSH (certificate authority)                     │    │
│   │ - PKI (X.509 certificates)                        │    │
│   │ - Cloud Providers (AWS/Azure/OCI)                 │    │
│   ├───────────────────────────────────────────────────┤    │
│   │ Storage Backend                                   │    │
│   │ - Encrypted storage (AES-256-GCM)                 │    │
│   │ - PostgreSQL / Raft cluster                       │    │
│   ├───────────────────────────────────────────────────┤    │
│   │ Audit Backend                                     │    │
│   │ - Structured logging (JSON)                       │    │
│   │ - Syslog, file, database sinks                    │    │
│   └───────────────────────────────────────────────────┘    │
└─────────────────────────────────────────────────────────────┘
             │
             ▼
┌─────────────────────────────────────────────────────────────┐
│   Backends (Dynamic Secret Generation)                      │
│                                                             │
│   - PostgreSQL/MySQL (database credentials)                 │
│   - AWS IAM (temporary access keys)                         │
│   - Azure AD (service principals)                           │
│   - SSH CA (signed certificates)                            │
│   - PKI (X.509 certificates)                                │
└─────────────────────────────────────────────────────────────┘

Implementation Characteristics

SecretumVault Provides:

  • Dynamic secret generation with configurable TTL
  • Secret versioning and rollback capabilities
  • Fine-grained access control (Cedar policies)
  • Complete audit trail (all operations logged)
  • Automatic secret rotation policies
  • High availability (Raft consensus)
  • Encryption at rest (AES-256-GCM)
  • Plugin architecture for secret backends
  • RESTful and gRPC APIs
  • Rust implementation (performance, safety)

Integration with Provisioning System:

  • Rust client library (native integration)
  • Nushell commands via CLI wrapper
  • Nickel configuration references secrets
  • Cedar policies control secret access
  • Orchestrator manages secret lifecycle
  • SSH integration for temporal keys
  • KMS integration for encryption keys

Rationale

Why SecretumVault Is Required

Aspect SOPS + Age (current) HashiCorp Vault SecretumVault (chosen)
Dynamic Secrets Static only Full support Full support
Rust Native ⚠️ External CLI Go binary Pure Rust
Cedar Integration None Custom policies Native Cedar
Audit Trail Git only Comprehensive Comprehensive
Secret Rotation Manual Automatic Automatic
Open Source Yes ⚠️ MPL 2.0 (BSL now) Yes
Self-Hosted Yes Yes Yes
License Permissive ⚠️ BSL (proprietary) Permissive
Versioning ⚠️ Git commits Built-in Built-in
High Availability Single file Raft cluster Raft cluster
Performance Fast (local) ⚠️ Network latency Rust performance

Why Not Continue with SOPS Alone?

SOPS is excellent for static secrets in git, but inadequate for:

  1. Dynamic Credentials: Cannot generate temporary DB passwords
  2. Audit Trail: Git commits are insufficient for compliance
  3. Rotation Policies: Manual rotation is error-prone
  4. Access Control: No runtime policy enforcement
  5. Secret Lifecycle: Cannot track usage or revoke access
  6. Multi-System Integration: Limited to files, not API-accessible

Complementary Approach:

  • SOPS: Configuration files with long-lived secrets (gitops workflow)
  • SecretumVault: Runtime dynamic secrets, short-lived credentials, audit trail

Why SecretumVault Over HashiCorp Vault?

HashiCorp Vault Limitations:

  1. License Change: BSL (Business Source License) - proprietary for production
  2. Not Rust Native: Go binary, subprocess overhead
  3. Custom Policy Language: HCL policies, not Cedar (provisioning standard)
  4. Complex Deployment: Heavy operational burden
  5. Vendor Lock-In: HashiCorp ecosystem dependency

SecretumVault Advantages:

  1. Rust Native: Zero-cost integration, no subprocess spawning
  2. Cedar Policies: Consistent with ADR-008 authorization model
  3. Lightweight: Smaller binary, lower resource usage
  4. Open Source: Permissive license, community-driven
  5. Provisioning-First: Designed for IaC workflows

Integration with Existing Security Architecture

ADR-009 (Security System):

  • SOPS: Static config encryption (unchanged)
  • Age: Key management for SOPS (unchanged)
  • SecretumVault: Dynamic secrets, runtime access control (new)

ADR-008 (Cedar Authorization):

  • Cedar policies control SecretumVault secret access
  • Fine-grained permissions: read:secret:database/prod/password
  • Audit trail records Cedar policy decisions

SSH Temporal Keys:

  • SecretumVault SSH CA signs user certificates
  • Short-lived certificates (1-24 hours)
  • Audit trail of SSH access

Consequences

Positive

  • Security Posture: Centralized secrets with audit trail and rotation
  • Compliance: Complete audit logs for regulatory requirements
  • Operational Excellence: Automatic rotation, dynamic credentials
  • Developer Experience: Simple API for secret access
  • Performance: Rust implementation, zero-cost abstractions
  • Consistency: Cedar policies across entire system (auth + secrets)
  • Observability: Metrics, logs, traces for secret access
  • Disaster Recovery: Secret versioning enables rollback

Negative

  • Infrastructure Complexity: Additional service to deploy and operate
  • High Availability Requirements: Raft cluster needs 3+ nodes
  • Migration Effort: Existing SOPS secrets need migration path
  • Learning Curve: Operators must learn vault concepts
  • Dependency Risk: Critical path service (secrets unavailable = system down)

Mitigation Strategies

High Availability:

# Deploy SecretumVault cluster (3 nodes)
provisioning deploy secretum-vault --ha --replicas 3

# Automatic leader election via Raft
# Clients auto-reconnect to leader

Migration from SOPS:

# Phase 1: Import existing SOPS secrets into SecretumVault
provisioning secrets migrate --from-sops config/secrets.yaml

# Phase 2: Update Nickel configs to reference vault paths
# Phase 3: Deprecate SOPS for runtime secrets (keep for config files)

Fallback Strategy:

// Graceful degradation if vault unavailable
let secret = match vault_client.get_secret("database/password").await {
    Ok(s) => s,
    Err(VaultError::Unavailable) => {
        // Fallback to SOPS for read-only operations
        warn!("Vault unavailable, using SOPS fallback");
        sops_decrypt("config/secrets.yaml", "database.password")?
    },
    Err(e) => return Err(e),
};

Operational Monitoring:

# prometheus metrics
secretum_vault_request_duration_seconds
secretum_vault_secret_lease_expiry
secretum_vault_auth_failures_total
secretum_vault_raft_leader_changes

# Alerts: Vault unavailable, high auth failure rate, lease expiry

Alternatives Considered

Alternative 1: Continue with SOPS Only

Pros: No new infrastructure, simple Cons: No dynamic secrets, no audit trail, manual rotation Decision: REJECTED - Insufficient for production security

Alternative 2: HashiCorp Vault

Pros: Mature, feature-rich, widely adopted Cons: BSL license, Go binary, HCL policies (not Cedar), complex deployment Decision: REJECTED - License and integration concerns

Alternative 3: Cloud Provider Native (AWS Secrets Manager, Azure Key Vault)

Pros: Fully managed, high availability Cons: Vendor lock-in, multi-cloud complexity, cost at scale Decision: REJECTED - Against open-source and multi-cloud principles

Alternative 4: CyberArk, 1Password, etc.

Pros: Enterprise features Cons: Proprietary, expensive, poor API integration Decision: REJECTED - Not suitable for IaC automation

Alternative 5: Build Custom Secrets Manager

Pros: Full control, tailored to needs Cons: High maintenance burden, security risk, reinventing wheel Decision: REJECTED - SecretumVault provides this already

Implementation Details

SecretumVault Deployment

# Deploy via provisioning system
provisioning deploy secretum-vault \
  --ha \
  --replicas 3 \
  --storage postgres \
  --tls-cert /path/to/cert.pem \
  --tls-key /path/to/key.pem

# Initialize and unseal
provisioning vault init
provisioning vault unseal --key-shares 5 --key-threshold 3

Rust Client Library

// provisioning/core/libs/secretum-client/src/lib.rs

use secretum_vault::{Client, SecretEngine, Auth};

pub struct VaultClient {
    client: Client,
}

impl VaultClient {
    pub async fn new(addr: &str, token: &str) -> Result<Self> {
        let client = Client::new(addr)
            .auth(Auth::Token(token))
            .tls_config(TlsConfig::from_files("ca.pem", "cert.pem", "key.pem"))?
            .build()?;

        Ok(Self { client })
    }

    pub async fn get_secret(&self, path: &str) -> Result<Secret> {
        self.client.kv2().get(path).await
    }

    pub async fn create_dynamic_db_credentials(&self, role: &str) -> Result<DbCredentials> {
        self.client.database().generate_credentials(role).await
    }

    pub async fn sign_ssh_key(&self, public_key: &str, ttl: Duration) -> Result<Certificate> {
        self.client.ssh().sign_key(public_key, ttl).await
    }
}

Nushell Integration

# Nushell commands via Rust CLI wrapper
provisioning secrets get database/prod/password
provisioning secrets set api/keys/stripe --value "sk_live_xyz"
provisioning secrets rotate database/prod/password
provisioning secrets lease renew lease_id_12345
provisioning secrets list database/

Nickel Configuration Integration

# provisioning/schemas/database.ncl
{
  database = {
    host = "postgres.example.com",
    port = 5432,
    username = secrets.get "database/prod/username",
    password = secrets.get "database/prod/password",
  }
}

# Nickel function: secrets.get resolves to SecretumVault API call

Cedar Policy for Secret Access

// policy: developers can read dev secrets, not prod
permit(
  principal in Group::"developers",
  action == Action::"read",
  resource in Secret::"database/dev"
);

forbid(
  principal in Group::"developers",
  action == Action::"read",
  resource in Secret::"database/prod"
);

// policy: CI/CD can generate dynamic DB credentials
permit(
  principal == Service::"github-actions",
  action == Action::"generate",
  resource in Secret::"database/dynamic"
) when {
  context.ttl <= duration("1h")
};

Dynamic Database Credentials

// Application requests temporary DB credentials
let creds = vault_client
    .database()
    .generate_credentials("postgres-readonly")
    .await?;

println!("Username: {}", creds.username); // v-app-abcd1234
println!("Password: {}", creds.password); // random-secure-password
println!("TTL: {}", creds.lease_duration);  // 1h

// Credentials automatically revoked after TTL
// No manual cleanup needed

Secret Rotation Automation

# secretum-vault config
[[rotation_policies]]
path = "database/prod/password"
schedule = "0 0 * * 0"  # Weekly on Sunday midnight
max_age = "30d"

[[rotation_policies]]
path = "api/keys/stripe"
schedule = "0 0 1 * *"  # Monthly on 1st
max_age = "90d"

Audit Log Format

{
  "timestamp": "2025-01-08T12:34:56Z",
  "type": "request",
  "auth": {
    "client_token": "sha256:abc123...",
    "accessor": "hmac:def456...",
    "display_name": "service-orchestrator",
    "policies": ["default", "service-policy"]
  },
  "request": {
    "operation": "read",
    "path": "secret/data/database/prod/password",
    "remote_address": "10.0.1.5"
  },
  "response": {
    "status": 200
  },
  "cedar_policy": {
    "decision": "permit",
    "policy_id": "allow-orchestrator-read-secrets"
  }
}

Testing Strategy

Unit Tests:

#[tokio::test]
async fn test_get_secret() {
    let vault = mock_vault_client();
    let secret = vault.get_secret("test/secret").await.unwrap();
    assert_eq!(secret.value, "expected-value");
}

#[tokio::test]
async fn test_dynamic_credentials_generation() {
    let vault = mock_vault_client();
    let creds = vault.create_dynamic_db_credentials("postgres-readonly").await.unwrap();
    assert!(creds.username.starts_with("v-"));
    assert_eq!(creds.lease_duration, Duration::from_secs(3600));
}

Integration Tests:

# Test vault deployment
provisioning deploy secretum-vault --test-mode
provisioning vault init
provisioning vault unseal

# Test secret operations
provisioning secrets set test/secret --value "test-value"
provisioning secrets get test/secret | assert "test-value"

# Test dynamic credentials
provisioning secrets db-creds postgres-readonly | jq '.username' | assert-contains "v-"

# Test rotation
provisioning secrets rotate test/secret

Security Tests:

#[tokio::test]
async fn test_unauthorized_access_denied() {
    let vault = vault_client_with_limited_token();
    let result = vault.get_secret("database/prod/password").await;
    assert!(matches!(result, Err(VaultError::PermissionDenied)));
}

Configuration Integration

Provisioning Config:

# provisioning/config/config.defaults.toml
[secrets]
provider = "secretum-vault"  # "secretum-vault" | "sops" | "env"
vault_addr = "https://vault.example.com:8200"
vault_namespace = "provisioning"
vault_mount = "secret"

[secrets.tls]
ca_cert = "/etc/provisioning/vault-ca.pem"
client_cert = "/etc/provisioning/vault-client.pem"
client_key = "/etc/provisioning/vault-client-key.pem"

[secrets.cache]
enabled = true
ttl = "5m"
max_size = "100MB"

Environment Variables:

export VAULT_ADDR="https://vault.example.com:8200"
export VAULT_TOKEN="s.abc123def456..."
export VAULT_NAMESPACE="provisioning"
export VAULT_CACERT="/etc/provisioning/vault-ca.pem"

Migration Path

Phase 1: Deploy SecretumVault

  • Deploy vault cluster in HA mode
  • Initialize and configure backends
  • Set up Cedar policies

Phase 2: Migrate Static Secrets

  • Import SOPS secrets into vault KV store
  • Update Nickel configs to reference vault paths
  • Verify secret access via new API

Phase 3: Enable Dynamic Secrets

  • Configure database secret engine
  • Configure SSH CA secret engine
  • Update applications to use dynamic credentials

Phase 4: Deprecate SOPS for Runtime

  • SOPS remains for gitops config files
  • Runtime secrets exclusively from vault
  • Audit trail enforcement

Phase 5: Automation

  • Automatic rotation policies
  • Lease renewal automation
  • Monitoring and alerting

Documentation Requirements

User Guides:

  • docs/user/secrets-management.md - Using SecretumVault
  • docs/user/dynamic-credentials.md - Dynamic secret workflows
  • docs/user/secret-rotation.md - Rotation policies and procedures

Operations Documentation:

  • docs/operations/vault-deployment.md - Deploying and configuring vault
  • docs/operations/vault-backup-restore.md - Backup and disaster recovery
  • docs/operations/vault-monitoring.md - Metrics, logs, alerts

Developer Documentation:

  • docs/development/secrets-api.md - Rust client library usage
  • docs/development/cedar-secret-policies.md - Writing Cedar policies for secrets
  • Secret engine development guide

Security Documentation:

  • docs/security/secrets-architecture.md - Security architecture overview
  • docs/security/audit-logging.md - Audit trail and compliance
  • Threat model and risk assessment

References


Status: Accepted Last Updated: 2025-01-08 Implementation: Planned Priority: High (Security and compliance) Estimated Complexity: Complex