24 KiB
ADR-014: SecretumVault Integration for Secrets Management
Status
Accepted - 2025-01-08
Context
The provisioning system manages sensitive data across multiple infrastructure layers: cloud provider credentials, database passwords, API keys, SSH keys, encryption keys, and service tokens. The current security architecture (ADR-009) includes SOPS for encrypted config files and Age for key management, but lacks a centralized secrets management solution with dynamic secrets, access control, and audit logging.
Current Secrets Management Challenges
Existing Approach:
-
SOPS + Age: Static secrets encrypted in config files
- Good: Version-controlled, gitops-friendly
- Limited: Static rotation, no audit trail, manual key distribution
-
Nickel Configuration: Declarative secrets references
- Good: Type-safe configuration
- Limited: Cannot generate dynamic secrets, no lifecycle management
-
Manual Secret Injection: Environment variables, CLI flags
- Good: Simple for development
- Limited: No security guarantees, prone to leakage
Problems Without Centralized Secrets Management
Security Issues:
- ❌ No centralized audit trail (who accessed which secret when)
- ❌ No automatic secret rotation policies
- ❌ No fine-grained access control (Cedar policies not enforced on secrets)
- ❌ Secrets scattered across: SOPS files, env vars, config files, K8s secrets
- ❌ No detection of secret sprawl or leaked credentials
Operational Issues:
- ❌ Manual secret rotation (error-prone, often neglected)
- ❌ No secret versioning (cannot rollback to previous credentials)
- ❌ Difficult onboarding (manual key distribution)
- ❌ No dynamic secrets (credentials exist indefinitely)
Compliance Issues:
- ❌ Cannot prove compliance with secret access policies
- ❌ No audit logs for regulatory requirements
- ❌ Cannot enforce secret expiration policies
- ❌ Difficult to demonstrate least-privilege access
Use Cases Requiring Centralized Secrets Management
-
Dynamic Database Credentials:
- Generate short-lived DB credentials for applications
- Automatic rotation based on policies
- Revocation on application termination
-
Cloud Provider API Keys:
- Centralized storage with access control
- Audit trail of credential usage
- Automatic rotation schedules
-
Service-to-Service Authentication:
- Dynamic tokens for microservices
- Short-lived certificates for mTLS
- Automatic renewal before expiration
-
SSH Key Management:
- Temporal SSH keys (ADR-009 SSH integration)
- Centralized certificate authority
- Audit trail of SSH access
-
Encryption Key Management:
- Master encryption keys for data at rest
- Key rotation and versioning
- Integration with KMS systems
Requirements for Secrets Management System
- ✅ Dynamic Secrets: Generate credentials on-demand with TTL
- ✅ Access Control: Integration with Cedar authorization policies
- ✅ Audit Logging: Complete trail of secret access and modifications
- ✅ Secret Rotation: Automatic and manual rotation policies
- ✅ Versioning: Track secret versions, enable rollback
- ✅ High Availability: Distributed, fault-tolerant architecture
- ✅ Encryption at Rest: AES-256-GCM for stored secrets
- ✅ API-First: RESTful API for integration
- ✅ Plugin Ecosystem: Extensible backends (AWS, Azure, databases)
- ✅ Open Source: Self-hosted, no vendor lock-in
Decision
Integrate SecretumVault as the centralized secrets management system for the provisioning platform.
Architecture Diagram
┌─────────────────────────────────────────────────────────────┐
│ Provisioning CLI / Orchestrator / Services │
│ │
│ - Workspace initialization (credentials) │
│ - Infrastructure deployment (cloud API keys) │
│ - Service configuration (database passwords) │
│ - SSH temporal keys (certificate generation) │
└────────────┬────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────┐
│ SecretumVault Client Library (Rust) │
│ (provisioning/core/libs/secretum-client/) │
│ │
│ - Authentication (token, mTLS) │
│ - Secret CRUD operations │
│ - Dynamic secret generation │
│ - Lease renewal and revocation │
│ - Policy enforcement │
└────────────┬────────────────────────────────────────────────┘
│ HTTPS + mTLS
▼
┌─────────────────────────────────────────────────────────────┐
│ SecretumVault Server │
│ (Rust-based Vault implementation) │
│ │
│ ┌───────────────────────────────────────────────────┐ │
│ │ API Layer (REST + gRPC) │ │
│ ├───────────────────────────────────────────────────┤ │
│ │ Authentication & Authorization │ │
│ │ - Token auth, mTLS, OIDC integration │ │
│ │ - Cedar policy enforcement │ │
│ ├───────────────────────────────────────────────────┤ │
│ │ Secret Engines │ │
│ │ - KV (key-value v2 with versioning) │ │
│ │ - Database (dynamic credentials) │ │
│ │ - SSH (certificate authority) │ │
│ │ - PKI (X.509 certificates) │ │
│ │ - Cloud Providers (AWS/Azure/OCI) │ │
│ ├───────────────────────────────────────────────────┤ │
│ │ Storage Backend │ │
│ │ - Encrypted storage (AES-256-GCM) │ │
│ │ - PostgreSQL / Raft cluster │ │
│ ├───────────────────────────────────────────────────┤ │
│ │ Audit Backend │ │
│ │ - Structured logging (JSON) │ │
│ │ - Syslog, file, database sinks │ │
│ └───────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────┐
│ Backends (Dynamic Secret Generation) │
│ │
│ - PostgreSQL/MySQL (database credentials) │
│ - AWS IAM (temporary access keys) │
│ - Azure AD (service principals) │
│ - SSH CA (signed certificates) │
│ - PKI (X.509 certificates) │
└─────────────────────────────────────────────────────────────┘
Implementation Characteristics
SecretumVault Provides:
- ✅ Dynamic secret generation with configurable TTL
- ✅ Secret versioning and rollback capabilities
- ✅ Fine-grained access control (Cedar policies)
- ✅ Complete audit trail (all operations logged)
- ✅ Automatic secret rotation policies
- ✅ High availability (Raft consensus)
- ✅ Encryption at rest (AES-256-GCM)
- ✅ Plugin architecture for secret backends
- ✅ RESTful and gRPC APIs
- ✅ Rust implementation (performance, safety)
Integration with Provisioning System:
- ✅ Rust client library (native integration)
- ✅ Nushell commands via CLI wrapper
- ✅ Nickel configuration references secrets
- ✅ Cedar policies control secret access
- ✅ Orchestrator manages secret lifecycle
- ✅ SSH integration for temporal keys
- ✅ KMS integration for encryption keys
Rationale
Why SecretumVault Is Required
| Aspect | SOPS + Age (current) | HashiCorp Vault | SecretumVault (chosen) |
|---|---|---|---|
| Dynamic Secrets | ❌ Static only | ✅ Full support | ✅ Full support |
| Rust Native | ⚠️ External CLI | ❌ Go binary | ✅ Pure Rust |
| Cedar Integration | ❌ None | ❌ Custom policies | ✅ Native Cedar |
| Audit Trail | ❌ Git only | ✅ Comprehensive | ✅ Comprehensive |
| Secret Rotation | ❌ Manual | ✅ Automatic | ✅ Automatic |
| Open Source | ✅ Yes | ⚠️ MPL 2.0 (BSL now) | ✅ Yes |
| Self-Hosted | ✅ Yes | ✅ Yes | ✅ Yes |
| License | ✅ Permissive | ⚠️ BSL (proprietary) | ✅ Permissive |
| Versioning | ⚠️ Git commits | ✅ Built-in | ✅ Built-in |
| High Availability | ❌ Single file | ✅ Raft cluster | ✅ Raft cluster |
| Performance | ✅ Fast (local) | ⚠️ Network latency | ✅ Rust performance |
Why Not Continue with SOPS Alone?
SOPS is excellent for static secrets in git, but inadequate for:
- Dynamic Credentials: Cannot generate temporary DB passwords
- Audit Trail: Git commits are insufficient for compliance
- Rotation Policies: Manual rotation is error-prone
- Access Control: No runtime policy enforcement
- Secret Lifecycle: Cannot track usage or revoke access
- Multi-System Integration: Limited to files, not API-accessible
Complementary Approach:
- SOPS: Configuration files with long-lived secrets (gitops workflow)
- SecretumVault: Runtime dynamic secrets, short-lived credentials, audit trail
Why SecretumVault Over HashiCorp Vault?
HashiCorp Vault Limitations:
- License Change: BSL (Business Source License) - proprietary for production
- Not Rust Native: Go binary, subprocess overhead
- Custom Policy Language: HCL policies, not Cedar (provisioning standard)
- Complex Deployment: Heavy operational burden
- Vendor Lock-In: HashiCorp ecosystem dependency
SecretumVault Advantages:
- Rust Native: Zero-cost integration, no subprocess spawning
- Cedar Policies: Consistent with ADR-008 authorization model
- Lightweight: Smaller binary, lower resource usage
- Open Source: Permissive license, community-driven
- Provisioning-First: Designed for IaC workflows
Integration with Existing Security Architecture
ADR-009 (Security System):
- SOPS: Static config encryption (unchanged)
- Age: Key management for SOPS (unchanged)
- SecretumVault: Dynamic secrets, runtime access control (new)
ADR-008 (Cedar Authorization):
- Cedar policies control SecretumVault secret access
- Fine-grained permissions:
read:secret:database/prod/password - Audit trail records Cedar policy decisions
SSH Temporal Keys:
- SecretumVault SSH CA signs user certificates
- Short-lived certificates (1-24 hours)
- Audit trail of SSH access
Consequences
Positive
- Security Posture: Centralized secrets with audit trail and rotation
- Compliance: Complete audit logs for regulatory requirements
- Operational Excellence: Automatic rotation, dynamic credentials
- Developer Experience: Simple API for secret access
- Performance: Rust implementation, zero-cost abstractions
- Consistency: Cedar policies across entire system (auth + secrets)
- Observability: Metrics, logs, traces for secret access
- Disaster Recovery: Secret versioning enables rollback
Negative
- Infrastructure Complexity: Additional service to deploy and operate
- High Availability Requirements: Raft cluster needs 3+ nodes
- Migration Effort: Existing SOPS secrets need migration path
- Learning Curve: Operators must learn vault concepts
- Dependency Risk: Critical path service (secrets unavailable = system down)
Mitigation Strategies
High Availability:
# Deploy SecretumVault cluster (3 nodes)
provisioning deploy secretum-vault --ha --replicas 3
# Automatic leader election via Raft
# Clients auto-reconnect to leader
Migration from SOPS:
# Phase 1: Import existing SOPS secrets into SecretumVault
provisioning secrets migrate --from-sops config/secrets.yaml
# Phase 2: Update Nickel configs to reference vault paths
# Phase 3: Deprecate SOPS for runtime secrets (keep for config files)
Fallback Strategy:
// Graceful degradation if vault unavailable
let secret = match vault_client.get_secret("database/password").await {
Ok(s) => s,
Err(VaultError::Unavailable) => {
// Fallback to SOPS for read-only operations
warn!("Vault unavailable, using SOPS fallback");
sops_decrypt("config/secrets.yaml", "database.password")?
},
Err(e) => return Err(e),
};
Operational Monitoring:
# prometheus metrics
secretum_vault_request_duration_seconds
secretum_vault_secret_lease_expiry
secretum_vault_auth_failures_total
secretum_vault_raft_leader_changes
# Alerts: Vault unavailable, high auth failure rate, lease expiry
Alternatives Considered
Alternative 1: Continue with SOPS Only
Pros: No new infrastructure, simple Cons: No dynamic secrets, no audit trail, manual rotation Decision: REJECTED - Insufficient for production security
Alternative 2: HashiCorp Vault
Pros: Mature, feature-rich, widely adopted Cons: BSL license, Go binary, HCL policies (not Cedar), complex deployment Decision: REJECTED - License and integration concerns
Alternative 3: Cloud Provider Native (AWS Secrets Manager, Azure Key Vault)
Pros: Fully managed, high availability Cons: Vendor lock-in, multi-cloud complexity, cost at scale Decision: REJECTED - Against open-source and multi-cloud principles
Alternative 4: CyberArk, 1Password, etc.
Pros: Enterprise features Cons: Proprietary, expensive, poor API integration Decision: REJECTED - Not suitable for IaC automation
Alternative 5: Build Custom Secrets Manager
Pros: Full control, tailored to needs Cons: High maintenance burden, security risk, reinventing wheel Decision: REJECTED - SecretumVault provides this already
Implementation Details
SecretumVault Deployment
# Deploy via provisioning system
provisioning deploy secretum-vault \
--ha \
--replicas 3 \
--storage postgres \
--tls-cert /path/to/cert.pem \
--tls-key /path/to/key.pem
# Initialize and unseal
provisioning vault init
provisioning vault unseal --key-shares 5 --key-threshold 3
Rust Client Library
// provisioning/core/libs/secretum-client/src/lib.rs
use secretum_vault::{Client, SecretEngine, Auth};
pub struct VaultClient {
client: Client,
}
impl VaultClient {
pub async fn new(addr: &str, token: &str) -> Result<Self> {
let client = Client::new(addr)
.auth(Auth::Token(token))
.tls_config(TlsConfig::from_files("ca.pem", "cert.pem", "key.pem"))?
.build()?;
Ok(Self { client })
}
pub async fn get_secret(&self, path: &str) -> Result<Secret> {
self.client.kv2().get(path).await
}
pub async fn create_dynamic_db_credentials(&self, role: &str) -> Result<DbCredentials> {
self.client.database().generate_credentials(role).await
}
pub async fn sign_ssh_key(&self, public_key: &str, ttl: Duration) -> Result<Certificate> {
self.client.ssh().sign_key(public_key, ttl).await
}
}
Nushell Integration
# Nushell commands via Rust CLI wrapper
provisioning secrets get database/prod/password
provisioning secrets set api/keys/stripe --value "sk_live_xyz"
provisioning secrets rotate database/prod/password
provisioning secrets lease renew lease_id_12345
provisioning secrets list database/
Nickel Configuration Integration
# provisioning/schemas/database.ncl
{
database = {
host = "postgres.example.com",
port = 5432,
username = secrets.get "database/prod/username",
password = secrets.get "database/prod/password",
}
}
# Nickel function: secrets.get resolves to SecretumVault API call
Cedar Policy for Secret Access
// policy: developers can read dev secrets, not prod
permit(
principal in Group::"developers",
action == Action::"read",
resource in Secret::"database/dev"
);
forbid(
principal in Group::"developers",
action == Action::"read",
resource in Secret::"database/prod"
);
// policy: CI/CD can generate dynamic DB credentials
permit(
principal == Service::"github-actions",
action == Action::"generate",
resource in Secret::"database/dynamic"
) when {
context.ttl <= duration("1h")
};
Dynamic Database Credentials
// Application requests temporary DB credentials
let creds = vault_client
.database()
.generate_credentials("postgres-readonly")
.await?;
println!("Username: {}", creds.username); // v-app-abcd1234
println!("Password: {}", creds.password); // random-secure-password
println!("TTL: {}", creds.lease_duration); // 1h
// Credentials automatically revoked after TTL
// No manual cleanup needed
Secret Rotation Automation
# secretum-vault config
[[rotation_policies]]
path = "database/prod/password"
schedule = "0 0 * * 0" # Weekly on Sunday midnight
max_age = "30d"
[[rotation_policies]]
path = "api/keys/stripe"
schedule = "0 0 1 * *" # Monthly on 1st
max_age = "90d"
Audit Log Format
{
"timestamp": "2025-01-08T12:34:56Z",
"type": "request",
"auth": {
"client_token": "sha256:abc123...",
"accessor": "hmac:def456...",
"display_name": "service-orchestrator",
"policies": ["default", "service-policy"]
},
"request": {
"operation": "read",
"path": "secret/data/database/prod/password",
"remote_address": "10.0.1.5"
},
"response": {
"status": 200
},
"cedar_policy": {
"decision": "permit",
"policy_id": "allow-orchestrator-read-secrets"
}
}
Testing Strategy
Unit Tests:
#[tokio::test]
async fn test_get_secret() {
let vault = mock_vault_client();
let secret = vault.get_secret("test/secret").await.unwrap();
assert_eq!(secret.value, "expected-value");
}
#[tokio::test]
async fn test_dynamic_credentials_generation() {
let vault = mock_vault_client();
let creds = vault.create_dynamic_db_credentials("postgres-readonly").await.unwrap();
assert!(creds.username.starts_with("v-"));
assert_eq!(creds.lease_duration, Duration::from_secs(3600));
}
Integration Tests:
# Test vault deployment
provisioning deploy secretum-vault --test-mode
provisioning vault init
provisioning vault unseal
# Test secret operations
provisioning secrets set test/secret --value "test-value"
provisioning secrets get test/secret | assert "test-value"
# Test dynamic credentials
provisioning secrets db-creds postgres-readonly | jq '.username' | assert-contains "v-"
# Test rotation
provisioning secrets rotate test/secret
Security Tests:
#[tokio::test]
async fn test_unauthorized_access_denied() {
let vault = vault_client_with_limited_token();
let result = vault.get_secret("database/prod/password").await;
assert!(matches!(result, Err(VaultError::PermissionDenied)));
}
Configuration Integration
Provisioning Config:
# provisioning/config/config.defaults.toml
[secrets]
provider = "secretum-vault" # "secretum-vault" | "sops" | "env"
vault_addr = "https://vault.example.com:8200"
vault_namespace = "provisioning"
vault_mount = "secret"
[secrets.tls]
ca_cert = "/etc/provisioning/vault-ca.pem"
client_cert = "/etc/provisioning/vault-client.pem"
client_key = "/etc/provisioning/vault-client-key.pem"
[secrets.cache]
enabled = true
ttl = "5m"
max_size = "100MB"
Environment Variables:
export VAULT_ADDR="https://vault.example.com:8200"
export VAULT_TOKEN="s.abc123def456..."
export VAULT_NAMESPACE="provisioning"
export VAULT_CACERT="/etc/provisioning/vault-ca.pem"
Migration Path
Phase 1: Deploy SecretumVault
- Deploy vault cluster in HA mode
- Initialize and configure backends
- Set up Cedar policies
Phase 2: Migrate Static Secrets
- Import SOPS secrets into vault KV store
- Update Nickel configs to reference vault paths
- Verify secret access via new API
Phase 3: Enable Dynamic Secrets
- Configure database secret engine
- Configure SSH CA secret engine
- Update applications to use dynamic credentials
Phase 4: Deprecate SOPS for Runtime
- SOPS remains for gitops config files
- Runtime secrets exclusively from vault
- Audit trail enforcement
Phase 5: Automation
- Automatic rotation policies
- Lease renewal automation
- Monitoring and alerting
Documentation Requirements
User Guides:
docs/user/secrets-management.md- Using SecretumVaultdocs/user/dynamic-credentials.md- Dynamic secret workflowsdocs/user/secret-rotation.md- Rotation policies and procedures
Operations Documentation:
docs/operations/vault-deployment.md- Deploying and configuring vaultdocs/operations/vault-backup-restore.md- Backup and disaster recoverydocs/operations/vault-monitoring.md- Metrics, logs, alerts
Developer Documentation:
docs/development/secrets-api.md- Rust client library usagedocs/development/cedar-secret-policies.md- Writing Cedar policies for secrets- Secret engine development guide
Security Documentation:
docs/security/secrets-architecture.md- Security architecture overviewdocs/security/audit-logging.md- Audit trail and compliance- Threat model and risk assessment
References
- SecretumVault GitHub (hypothetical, replace with actual)
- HashiCorp Vault Documentation (for comparison)
- ADR-008: Cedar Authorization (policy integration)
- ADR-009: Security System Complete (current security architecture)
- Raft Consensus Algorithm
- Cedar Policy Language
- SOPS: https://github.com/getsops/sops
- Age Encryption: https://age-encryption.org/
Status: Accepted Last Updated: 2025-01-08 Implementation: Planned Priority: High (Security and compliance) Estimated Complexity: Complex