24 KiB
24 KiB
\n┌─────────────────────────────────────────────────────────────┐\n│ Provisioning CLI / Orchestrator / Services │\n│ │\n│ - Workspace initialization (credentials) │\n│ - Infrastructure deployment (cloud API keys) │\n│ - Service configuration (database passwords) │\n│ - SSH temporal keys (certificate generation) │\n└────────────┬────────────────────────────────────────────────┘\n │\n ▼\n┌─────────────────────────────────────────────────────────────┐\n│ SecretumVault Client Library (Rust) │\n│ (provisioning/core/libs/secretum-client/) │\n│ │\n│ - Authentication (token, mTLS) │\n│ - Secret CRUD operations │\n│ - Dynamic secret generation │\n│ - Lease renewal and revocation │\n│ - Policy enforcement │\n└────────────┬────────────────────────────────────────────────┘\n │ HTTPS + mTLS\n ▼\n┌─────────────────────────────────────────────────────────────┐\n│ SecretumVault Server │\n│ (Rust-based Vault implementation) │\n│ │\n│ ┌───────────────────────────────────────────────────┐ │\n│ │ API Layer (REST + gRPC) │ │\n│ ├───────────────────────────────────────────────────┤ │\n│ │ Authentication & Authorization │ │\n│ │ - Token auth, mTLS, OIDC integration │ │\n│ │ - Cedar policy enforcement │ │\n│ ├───────────────────────────────────────────────────┤ │\n│ │ Secret Engines │ │\n│ │ - KV (key-value v2 with versioning) │ │\n│ │ - Database (dynamic credentials) │ │\n│ │ - SSH (certificate authority) │ │\n│ │ - PKI (X.509 certificates) │ │\n│ │ - Cloud Providers (AWS/Azure/OCI) │ │\n│ ├───────────────────────────────────────────────────┤ │\n│ │ Storage Backend │ │\n│ │ - Encrypted storage (AES-256-GCM) │ │\n│ │ - PostgreSQL / Raft cluster │ │\n│ ├───────────────────────────────────────────────────┤ │\n│ │ Audit Backend │ │\n│ │ - Structured logging (JSON) │ │\n│ │ - Syslog, file, database sinks │ │\n│ └───────────────────────────────────────────────────┘ │\n└─────────────────────────────────────────────────────────────┘\n │\n ▼\n┌─────────────────────────────────────────────────────────────┐\n│ Backends (Dynamic Secret Generation) │\n│ │\n│ - PostgreSQL/MySQL (database credentials) │\n│ - AWS IAM (temporary access keys) │\n│ - Azure AD (service principals) │\n│ - SSH CA (signed certificates) │\n│ - PKI (X.509 certificates) │\n└─────────────────────────────────────────────────────────────┘\n\n\n### Implementation Characteristics\n\nSecretumVault Provides:\n\n- ✅ Dynamic secret generation with configurable TTL\n- ✅ Secret versioning and rollback capabilities\n- ✅ Fine-grained access control (Cedar policies)\n- ✅ Complete audit trail (all operations logged)\n- ✅ Automatic secret rotation policies\n- ✅ High availability (Raft consensus)\n- ✅ Encryption at rest (AES-256-GCM)\n- ✅ Plugin architecture for secret backends\n- ✅ RESTful and gRPC APIs\n- ✅ Rust implementation (performance, safety)\n\nIntegration with Provisioning System:\n\n- ✅ Rust client library (native integration)\n- ✅ Nushell commands via CLI wrapper\n- ✅ Nickel configuration references secrets\n- ✅ Cedar policies control secret access\n- ✅ Orchestrator manages secret lifecycle\n- ✅ SSH integration for temporal keys\n- ✅ KMS integration for encryption keys\n\n## Rationale\n\n### Why SecretumVault Is Required\n\n| Aspect | SOPS + Age (current) | HashiCorp Vault | SecretumVault (chosen) |\n| -------- | ---------------------- | ----------------- | ------------------------ |\n| Dynamic Secrets | ❌ Static only | ✅ Full support | ✅ Full support |\n| Rust Native | ⚠️ External CLI | ❌ Go binary | ✅ Pure Rust |\n| Cedar Integration | ❌ None | ❌ Custom policies | ✅ Native Cedar |\n| Audit Trail | ❌ Git only | ✅ Comprehensive | ✅ Comprehensive |\n| Secret Rotation | ❌ Manual | ✅ Automatic | ✅ Automatic |\n| Open Source | ✅ Yes | ⚠️ MPL 2.0 (BSL now) | ✅ Yes |\n| Self-Hosted | ✅ Yes | ✅ Yes | ✅ Yes |\n| License | ✅ Permissive | ⚠️ BSL (proprietary) | ✅ Permissive |\n| Versioning | ⚠️ Git commits | ✅ Built-in | ✅ Built-in |\n| High Availability | ❌ Single file | ✅ Raft cluster | ✅ Raft cluster |\n| Performance | ✅ Fast (local) | ⚠️ Network latency | ✅ Rust performance |\n\n### Why Not Continue with SOPS Alone\n\nSOPS is excellent for static secrets in git, but inadequate for:\n\n1. Dynamic Credentials: Cannot generate temporary DB passwords\n2. Audit Trail: Git commits are insufficient for compliance\n3. Rotation Policies: Manual rotation is error-prone\n4. Access Control: No runtime policy enforcement\n5. Secret Lifecycle: Cannot track usage or revoke access\n6. Multi-System Integration: Limited to files, not API-accessible\n\nComplementary Approach:\n- SOPS: Configuration files with long-lived secrets (gitops workflow)\n- SecretumVault: Runtime dynamic secrets, short-lived credentials, audit trail\n\n### Why SecretumVault Over HashiCorp Vault\n\nHashiCorp Vault Limitations:\n\n1. License Change: BSL (Business Source License) - proprietary for production\n2. Not Rust Native: Go binary, subprocess overhead\n3. Custom Policy Language: HCL policies, not Cedar (provisioning standard)\n4. Complex Deployment: Heavy operational burden\n5. Vendor Lock-In: HashiCorp ecosystem dependency\n\nSecretumVault Advantages:\n\n1. Rust Native: Zero-cost integration, no subprocess spawning\n2. Cedar Policies: Consistent with ADR-008 authorization model\n3. Lightweight: Smaller binary, lower resource usage\n4. Open Source: Permissive license, community-driven\n5. Provisioning-First: Designed for IaC workflows\n\n### Integration with Existing Security Architecture\n\nADR-009 (Security System):\n- SOPS: Static config encryption (unchanged)\n- Age: Key management for SOPS (unchanged)\n- SecretumVault: Dynamic secrets, runtime access control (new)\n\nADR-008 (Cedar Authorization):\n- Cedar policies control SecretumVault secret access\n- Fine-grained permissions: read:secret:database/prod/password\n- Audit trail records Cedar policy decisions\n\nSSH Temporal Keys:\n- SecretumVault SSH CA signs user certificates\n- Short-lived certificates (1-24 hours)\n- Audit trail of SSH access\n\n## Consequences\n\n### Positive\n\n- Security Posture: Centralized secrets with audit trail and rotation\n- Compliance: Complete audit logs for regulatory requirements\n- Operational Excellence: Automatic rotation, dynamic credentials\n- Developer Experience: Simple API for secret access\n- Performance: Rust implementation, zero-cost abstractions\n- Consistency: Cedar policies across entire system (auth + secrets)\n- Observability: Metrics, logs, traces for secret access\n- Disaster Recovery: Secret versioning enables rollback\n\n### Negative\n\n- Infrastructure Complexity: Additional service to deploy and operate\n- High Availability Requirements: Raft cluster needs 3+ nodes\n- Migration Effort: Existing SOPS secrets need migration path\n- Learning Curve: Operators must learn vault concepts\n- Dependency Risk: Critical path service (secrets unavailable = system down)\n\n### Mitigation Strategies\n\nHigh Availability:\n\n# Deploy SecretumVault cluster (3 nodes)\nprovisioning deploy secretum-vault --ha --replicas 3\n\n# Automatic leader election via Raft\n# Clients auto-reconnect to leader\n\n\nMigration from SOPS:\n\n# Phase 1: Import existing SOPS secrets into SecretumVault\nprovisioning secrets migrate --from-sops config/secrets.yaml\n\n# Phase 2: Update Nickel configs to reference vault paths\n# Phase 3: Deprecate SOPS for runtime secrets (keep for config files)\n\n\nFallback Strategy:\n\n// Graceful degradation if vault unavailable\nlet secret = match vault_client.get_secret("database/password").await {\n Ok(s) => s,\n Err(VaultError::Unavailable) => {\n // Fallback to SOPS for read-only operations\n warn!("Vault unavailable, using SOPS fallback");\n sops_decrypt("config/secrets.yaml", "database.password")?\n },\n Err(e) => return Err(e),\n};\n\n\nOperational Monitoring:\n\n# prometheus metrics\nsecretum_vault_request_duration_seconds\nsecretum_vault_secret_lease_expiry\nsecretum_vault_auth_failures_total\nsecretum_vault_raft_leader_changes\n\n# Alerts: Vault unavailable, high auth failure rate, lease expiry\n\n\n## Alternatives Considered\n\n### Alternative 1: Continue with SOPS Only\n\nPros: No new infrastructure, simple\nCons: No dynamic secrets, no audit trail, manual rotation\nDecision: REJECTED - Insufficient for production security\n\n### Alternative 2: HashiCorp Vault\n\nPros: Mature, feature-rich, widely adopted\nCons: BSL license, Go binary, HCL policies (not Cedar), complex deployment\nDecision: REJECTED - License and integration concerns\n\n### Alternative 3: Cloud Provider Native (AWS Secrets Manager, Azure Key Vault)\n\nPros: Fully managed, high availability\nCons: Vendor lock-in, multi-cloud complexity, cost at scale\nDecision: REJECTED - Against open-source and multi-cloud principles\n\n### Alternative 4: CyberArk, 1Password, and Others\n\nPros: Enterprise features\nCons: Proprietary, expensive, poor API integration\nDecision: REJECTED - Not suitable for IaC automation\n\n### Alternative 5: Build Custom Secrets Manager\n\nPros: Full control, tailored to needs\nCons: High maintenance burden, security risk, reinventing wheel\nDecision: REJECTED - SecretumVault provides this already\n\n## Implementation Details\n\n### SecretumVault Deployment\n\n\n# Deploy via provisioning system\nprovisioning deploy secretum-vault \\n --ha \\n --replicas 3 \\n --storage postgres \\n --tls-cert /path/to/cert.pem \\n --tls-key /path/to/key.pem\n\n# Initialize and unseal\nprovisioning vault init\nprovisioning vault unseal --key-shares 5 --key-threshold 3\n\n\n### Rust Client Library\n\n\n// provisioning/core/libs/secretum-client/src/lib.rs\n\nuse secretum_vault::{Client, SecretEngine, Auth};\n\npub struct VaultClient {\n client: Client,\n}\n\nimpl VaultClient {\n pub async fn new(addr: &str, token: &str) -> Result<Self> {\n let client = Client::new(addr)\n .auth(Auth::Token(token))\n .tls_config(TlsConfig::from_files("ca.pem", "cert.pem", "key.pem"))?\n .build()?;\n\n Ok(Self { client })\n }\n\n pub async fn get_secret(&self, path: &str) -> Result<Secret> {\n self.client.kv2().get(path).await\n }\n\n pub async fn create_dynamic_db_credentials(&self, role: &str) -> Result<DbCredentials> {\n self.client.database().generate_credentials(role).await\n }\n\n pub async fn sign_ssh_key(&self, public_key: &str, ttl: Duration) -> Result<Certificate> {\n self.client.ssh().sign_key(public_key, ttl).await\n }\n}\n\n\n### Nushell Integration\n\n\n# Nushell commands via Rust CLI wrapper\nprovisioning secrets get database/prod/password\nprovisioning secrets set api/keys/stripe --value "sk_live_xyz"\nprovisioning secrets rotate database/prod/password\nprovisioning secrets lease renew lease_id_12345\nprovisioning secrets list database/\n\n\n### Nickel Configuration Integration\n\n\n# provisioning/schemas/database.ncl\n{\n database = {\n host = "postgres.example.com",\n port = 5432,\n username = secrets.get "database/prod/username",\n password = secrets.get "database/prod/password",\n }\n}\n\n# Nickel function: secrets.get resolves to SecretumVault API call\n\n\n### Cedar Policy for Secret Access\n\n\n// policy: developers can read dev secrets, not prod\npermit(\n principal in Group::"developers",\n action == Action::"read",\n resource in Secret::"database/dev"\n);\n\nforbid(\n principal in Group::"developers",\n action == Action::"read",\n resource in Secret::"database/prod"\n);\n\n// policy: CI/CD can generate dynamic DB credentials\npermit(\n principal == Service::"github-actions",\n action == Action::"generate",\n resource in Secret::"database/dynamic"\n) when {\n context.ttl <= duration("1h")\n};\n\n\n### Dynamic Database Credentials\n\n\n// Application requests temporary DB credentials\nlet creds = vault_client\n .database()\n .generate_credentials("postgres-readonly")\n .await?;\n\nprintln!("Username: {}", creds.username); // v-app-abcd1234\nprintln!("Password: {}", creds.password); // random-secure-password\nprintln!("TTL: {}", creds.lease_duration); // 1h\n\n// Credentials automatically revoked after TTL\n// No manual cleanup needed\n\n\n### Secret Rotation Automation\n\n\n# secretum-vault config\n[[rotation_policies]]\npath = "database/prod/password"\nschedule = "0 0 * * 0" # Weekly on Sunday midnight\nmax_age = "30d"\n\n[[rotation_policies]]\npath = "api/keys/stripe"\nschedule = "0 0 1 * *" # Monthly on 1st\nmax_age = "90d"\n\n\n### Audit Log Format\n\n\n{\n "timestamp": "2025-01-08T12:34:56Z",\n "type": "request",\n "auth": {\n "client_token": "sha256:abc123...",\n "accessor": "hmac:def456...",\n "display_name": "service-orchestrator",\n "policies": ["default", "service-policy"]\n },\n "request": {\n "operation": "read",\n "path": "secret/data/database/prod/password",\n "remote_address": "10.0.1.5"\n },\n "response": {\n "status": 200\n },\n "cedar_policy": {\n "decision": "permit",\n "policy_id": "allow-orchestrator-read-secrets"\n }\n}\n\n\n## Testing Strategy\n\nUnit Tests:\n\n#[tokio::test]\nasync fn test_get_secret() {\n let vault = mock_vault_client();\n let secret = vault.get_secret("test/secret").await.unwrap();\n assert_eq!(secret.value, "expected-value");\n}\n\n#[tokio::test]\nasync fn test_dynamic_credentials_generation() {\n let vault = mock_vault_client();\n let creds = vault.create_dynamic_db_credentials("postgres-readonly").await.unwrap();\n assert!(creds.username.starts_with("v-"));\n assert_eq!(creds.lease_duration, Duration::from_secs(3600));\n}\n\n\nIntegration Tests:\n\n# Test vault deployment\nprovisioning deploy secretum-vault --test-mode\nprovisioning vault init\nprovisioning vault unseal\n\n# Test secret operations\nprovisioning secrets set test/secret --value "test-value"\nprovisioning secrets get test/secret | assert "test-value"\n\n# Test dynamic credentials\nprovisioning secrets db-creds postgres-readonly | jq '.username' | assert-contains "v-"\n\n# Test rotation\nprovisioning secrets rotate test/secret\n\n\nSecurity Tests:\n\n#[tokio::test]\nasync fn test_unauthorized_access_denied() {\n let vault = vault_client_with_limited_token();\n let result = vault.get_secret("database/prod/password").await;\n assert!(matches!(result, Err(VaultError::PermissionDenied)));\n}\n\n\n## Configuration Integration\n\nProvisioning Config:\n\n# provisioning/config/config.defaults.toml\n[secrets]\nprovider = "secretum-vault" # "secretum-vault" | "sops" | "env"\nvault_addr = "https://vault.example.com:8200"\nvault_namespace = "provisioning"\nvault_mount = "secret"\n\n[secrets.tls]\nca_cert = "/etc/provisioning/vault-ca.pem"\nclient_cert = "/etc/provisioning/vault-client.pem"\nclient_key = "/etc/provisioning/vault-client-key.pem"\n\n[secrets.cache]\nenabled = true\nttl = "5m"\nmax_size = "100MB"\n\n\nEnvironment Variables:\n\nexport VAULT_ADDR="https://vault.example.com:8200"\nexport VAULT_TOKEN="s.abc123def456..."\nexport VAULT_NAMESPACE="provisioning"\nexport VAULT_CACERT="/etc/provisioning/vault-ca.pem"\n\n\n## Migration Path\n\nPhase 1: Deploy SecretumVault\n- Deploy vault cluster in HA mode\n- Initialize and configure backends\n- Set up Cedar policies\n\nPhase 2: Migrate Static Secrets\n- Import SOPS secrets into vault KV store\n- Update Nickel configs to reference vault paths\n- Verify secret access via new API\n\nPhase 3: Enable Dynamic Secrets\n- Configure database secret engine\n- Configure SSH CA secret engine\n- Update applications to use dynamic credentials\n\nPhase 4: Deprecate SOPS for Runtime\n- SOPS remains for gitops config files\n- Runtime secrets exclusively from vault\n- Audit trail enforcement\n\nPhase 5: Automation\n- Automatic rotation policies\n- Lease renewal automation\n- Monitoring and alerting\n\n## Documentation Requirements\n\nUser Guides:\n- docs/user/secrets-management.md - Using SecretumVault\n- docs/user/dynamic-credentials.md - Dynamic secret workflows\n- docs/user/secret-rotation.md - Rotation policies and procedures\n\nOperations Documentation:\n- docs/operations/vault-deployment.md - Deploying and configuring vault\n- docs/operations/vault-backup-restore.md - Backup and disaster recovery\n- docs/operations/vault-monitoring.md - Metrics, logs, alerts\n\nDeveloper Documentation:\n- docs/development/secrets-api.md - Rust client library usage\n- docs/development/cedar-secret-policies.md - Writing Cedar policies for secrets\n- Secret engine development guide\n\nSecurity Documentation:\n- docs/security/secrets-architecture.md - Security architecture overview\n- docs/security/audit-logging.md - Audit trail and compliance\n- Threat model and risk assessment\n\n## References\n\n- SecretumVault GitHub (hypothetical, replace with actual)\n- HashiCorp Vault Documentation (for comparison)\n- ADR-008: Cedar Authorization (policy integration)\n- ADR-009: Security System Complete (current security architecture)\n- Raft Consensus Algorithm\n- Cedar Policy Language\n- SOPS: https://github.com/getsops/sops\n- Age Encryption: https://age-encryption.org/\n\n---\n\nStatus: Accepted\nLast Updated: 2025-01-08\nImplementation: Planned\nPriority: High (Security and compliance)\nEstimated Complexity: Complex