provisioning/docs/src/security/secrets-management-guide.md
2026-01-14 03:09:18 +00:00

12 KiB

Secrets Management System - Configuration Guide\n\nStatus: Production Ready\nDate: 2025-11-19\nVersion: 1.0.0\n\n## Overview\n\nThe provisioning system supports secure SSH key retrieval from multiple secret sources, eliminating hardcoded filesystem dependencies and enabling\nenterprise-grade security. SSH keys are retrieved from configured secret sources (SOPS, KMS, RustyVault) with automatic fallback to local-dev mode for\ndevelopment environments.\n\n## Secret Sources\n\n### 1. SOPS (Secrets Operations)\n\nAge-based encrypted secrets file with YAML structure.\n\nPros:\n\n- Age encryption (modern, performant)\n- Easy to version in Git (encrypted)\n- No external services required\n- Simple YAML structure\n\nCons:\n\n- Requires Age key management\n- No key rotation automation\n\nEnvironment Variables:\n\n\nPROVISIONING_SECRET_SOURCE=sops\nPROVISIONING_SOPS_ENABLED=true\nPROVISIONING_SOPS_SECRETS_FILE=/path/to/secrets.enc.yaml\nPROVISIONING_SOPS_AGE_KEY_FILE=$HOME/.age/provisioning\n\n\nSecrets File Structure (provisioning/secrets.enc.yaml):\n\n\n# Encrypted with sops\nssh:\n web-01:\n ubuntu: /path/to/id_rsa\n root: /path/to/root_id_rsa\n db-01:\n postgres: /path/to/postgres_id_rsa\n\n\nSetup Instructions:\n\n\n# 1. Install sops and age\nbrew install sops age\n\n# 2. Generate Age key (store securely!)\nage-keygen -o $HOME/.age/provisioning\n\n# 3. Create encrypted secrets file\ncat > secrets.yaml << 'EOF'\nssh:\n web-01:\n ubuntu: ~/.ssh/provisioning_web01\n db-01:\n postgres: ~/.ssh/provisioning_db01\nEOF\n\n# 4. Encrypt with sops\nsops -e -i secrets.yaml\n\n# 5. Rename to enc version\nmv secrets.yaml provisioning/secrets.enc.yaml\n\n# 6. Configure environment\nexport PROVISIONING_SECRET_SOURCE=sops\nexport PROVISIONING_SOPS_SECRETS_FILE=$(pwd)/provisioning/secrets.enc.yaml\nexport PROVISIONING_SOPS_AGE_KEY_FILE=$HOME/.age/provisioning\n\n\n### 2. KMS (Key Management Service)\n\nAWS KMS or compatible key management service.\n\nPros:\n\n- Cloud-native security\n- Automatic key rotation\n- Audit logging built-in\n- High availability\n\nCons:\n\n- Requires AWS account/credentials\n- API calls add latency (~50 ms)\n- Cost per API call\n\nEnvironment Variables:\n\n\nPROVISIONING_SECRET_SOURCE=kms\nPROVISIONING_KMS_ENABLED=true\nPROVISIONING_KMS_REGION=us-east-1\n\n\nSecret Storage Pattern:\n\n\nprovisioning/ssh-keys/{hostname}/{username}\n\n\nSetup Instructions:\n\n\n# 1. Create KMS key (one-time)\naws kms create-key \\n --description "Provisioning SSH Keys" \\n --region us-east-1\n\n# 2. Store SSH keys in Secrets Manager\naws secretsmanager create-secret \\n --name provisioning/ssh-keys/web-01/ubuntu \\n --secret-string "$(cat ~/.ssh/provisioning_web01)" \\n --region us-east-1\n\n# 3. Configure environment\nexport PROVISIONING_SECRET_SOURCE=kms\nexport PROVISIONING_KMS_REGION=us-east-1\n\n# 4. Ensure AWS credentials available\nexport AWS_PROFILE=provisioning\n# or\nexport AWS_ACCESS_KEY_ID=...\nexport AWS_SECRET_ACCESS_KEY=...\n\n\n### 3. RustyVault (Hashicorp Vault-Compatible)\n\nSelf-hosted or managed Vault instance for secrets.\n\nPros:\n\n- Self-hosted option\n- Fine-grained access control\n- Multiple authentication methods\n- Easy key rotation\n\nCons:\n\n- Requires Vault instance\n- More operational overhead\n- Network latency\n\nEnvironment Variables:\n\n\nPROVISIONING_SECRET_SOURCE=vault\nPROVISIONING_VAULT_ENABLED=true\nPROVISIONING_VAULT_ADDRESS=http://localhost:8200\nPROVISIONING_VAULT_TOKEN=hvs.CAESIAoICQ...\n\n\nSecret Storage Pattern:\n\n\nGET /v1/secret/ssh-keys/{hostname}/{username}\n# Returns: {"key_content": "-----BEGIN OPENSSH PRIVATE KEY-----..."}\n\n\nSetup Instructions:\n\n\n# 1. Start Vault (if not already running)\ndocker run -p 8200:8200 \\n -e VAULT_DEV_ROOT_TOKEN_ID=provisioning \\n vault server -dev\n\n# 2. Create KV v2 mount (if not exists)\nvault secrets enable -version=2 -path=secret kv\n\n# 3. Store SSH key\nvault kv put secret/ssh-keys/web-01/ubuntu \\n key_content=@~/.ssh/provisioning_web01\n\n# 4. Configure environment\nexport PROVISIONING_SECRET_SOURCE=vault\nexport PROVISIONING_VAULT_ADDRESS=http://localhost:8200\nexport PROVISIONING_VAULT_TOKEN=provisioning\n\n# 5. Create AppRole for production\nvault auth enable approle\nvault write auth/approle/role/provisioning \\n token_ttl=1h \\n token_max_ttl=4h\nvault read auth/approle/role/provisioning/role-id\nvault write -f auth/approle/role/provisioning/secret-id\n\n\n### 4. Local-Dev (Fallback)\n\nLocal filesystem SSH keys (development only).\n\nPros:\n\n- No setup required\n- Fast (local filesystem)\n- Works offline\n\nCons:\n\n- NOT for production\n- Hardcoded filesystem dependency\n- No key rotation\n\nEnvironment Variables:\n\n\nPROVISIONING_ENVIRONMENT=local-dev\n\n\nBehavior:\n\nStandard paths checked (in order):\n\n1. $HOME/.ssh/id_rsa\n2. $HOME/.ssh/id_ed25519\n3. $HOME/.ssh/provisioning\n4. $HOME/.ssh/provisioning_rsa\n\n## Auto-Detection Logic\n\nWhen PROVISIONING_SECRET_SOURCE is not explicitly set, the system auto-detects in this order:\n\n\n1. PROVISIONING_SOPS_ENABLED=true or PROVISIONING_SOPS_SECRETS_FILE set?\n → Use SOPS\n2. PROVISIONING_KMS_ENABLED=true or PROVISIONING_KMS_REGION set?\n → Use KMS\n3. PROVISIONING_VAULT_ENABLED=true or both VAULT_ADDRESS and VAULT_TOKEN set?\n → Use Vault\n4. Otherwise\n → Use local-dev (with warnings in production environments)\n\n\n## Configuration Matrix\n\n| Secret Source | Env Variables | Enabled in |\n| --- | --- | --- |\n| SOPS | PROVISIONING_SOPS_* | Development, Staging, Production |\n| KMS | PROVISIONING_KMS_* | Staging, Production (with AWS) |\n| Vault | PROVISIONING_VAULT_* | Development, Staging, Production |\n| Local-dev | PROVISIONING_ENVIRONMENT=local-dev | Development only |\n\n## Production Recommended Setup\n\n### Minimal Setup (Single Source)\n\n\n# Using Vault (recommended for self-hosted)\nexport PROVISIONING_SECRET_SOURCE=vault\nexport PROVISIONING_VAULT_ADDRESS=https://vault.example.com:8200\nexport PROVISIONING_VAULT_TOKEN=hvs.CAESIAoICQ...\nexport PROVISIONING_ENVIRONMENT=production\n\n\n### Enhanced Setup (Fallback Chain)\n\n\n# Primary: Vault\nexport PROVISIONING_VAULT_ADDRESS=https://vault.primary.com:8200\nexport PROVISIONING_VAULT_TOKEN=hvs.CAESIAoICQ...\n\n# Fallback: SOPS\nexport PROVISIONING_SOPS_SECRETS_FILE=/etc/provisioning/secrets.enc.yaml\nexport PROVISIONING_SOPS_AGE_KEY_FILE=/etc/provisioning/.age/key\n\n# Environment\nexport PROVISIONING_ENVIRONMENT=production\nexport PROVISIONING_SECRET_SOURCE=vault # Explicit: use Vault first\n\n\n### High-Availability Setup\n\n\n# Use KMS (managed service)\nexport PROVISIONING_SECRET_SOURCE=kms\nexport PROVISIONING_KMS_REGION=us-east-1\nexport AWS_PROFILE=provisioning-admin\n\n# Or use Vault with HA\nexport PROVISIONING_VAULT_ADDRESS=https://vault-ha.example.com:8200\nexport PROVISIONING_VAULT_NAMESPACE=provisioning\nexport PROVISIONING_ENVIRONMENT=production\n\n\n## Validation & Testing\n\n### Check Configuration\n\n\n# Nushell\nprovisioning secrets status\n\n# Show secret source and configuration\nprovisioning secrets validate\n\n# Detailed diagnostics\nprovisioning secrets diagnose\n\n\n### Test SSH Key Retrieval\n\n\n# Test specific host/user\nprovisioning secrets get-key web-01 ubuntu\n\n# Test all configured hosts\nprovisioning secrets validate-all\n\n# Dry-run SSH with retrieved key\nprovisioning ssh --test-key web-01 ubuntu\n\n\n## Migration Path\n\n### From Local-Dev to SOPS\n\n\n# 1. Create SOPS secrets file with existing keys\ncat > secrets.yaml << 'EOF'\nssh:\n web-01:\n ubuntu: ~/.ssh/provisioning_web01\n db-01:\n postgres: ~/.ssh/provisioning_db01\nEOF\n\n# 2. Encrypt with Age\nsops -e -i secrets.yaml\n\n# 3. Move to repo\nmv secrets.yaml provisioning/secrets.enc.yaml\n\n# 4. Update environment\nexport PROVISIONING_SECRET_SOURCE=sops\nexport PROVISIONING_SOPS_SECRETS_FILE=$(pwd)/provisioning/secrets.enc.yaml\nexport PROVISIONING_SOPS_AGE_KEY_FILE=$HOME/.age/provisioning\n\n\n### From SOPS to Vault\n\n\n# 1. Decrypt SOPS file\nsops -d provisioning/secrets.enc.yaml > /tmp/secrets.yaml\n\n# 2. Import to Vault\nvault kv put secret/ssh-keys/web-01/ubuntu key_content=@~/.ssh/provisioning_web01\n\n# 3. Update environment\nexport PROVISIONING_SECRET_SOURCE=vault\nexport PROVISIONING_VAULT_ADDRESS=http://vault.example.com:8200\nexport PROVISIONING_VAULT_TOKEN=hvs.CAESIAoICQ...\n\n# 4. Validate retrieval works\nprovisioning secrets validate-all\n\n\n## Security Best Practices\n\n### 1. Never Commit Secrets\n\n\n# Add to .gitignore\necho "provisioning/secrets.enc.yaml" >> .gitignore\necho ".age/provisioning" >> .gitignore\necho ".vault-token" >> .gitignore\n\n\n### 2. Rotate Keys Regularly\n\n\n# SOPS: Rotate Age key\nage-keygen -o ~/.age/provisioning.new\n# Update all secrets with new key\n\n# KMS: Enable automatic rotation\naws kms enable-key-rotation --key-id alias/provisioning\n\n# Vault: Set TTL on secrets\nvault write -f secret/metadata/ssh-keys/web-01/ubuntu \\n delete_version_after=2160h # 90 days\n\n\n### 3. Restrict Access\n\n\n# SOPS: Protect Age key\nchmod 600 ~/.age/provisioning\n\n# KMS: Restrict IAM permissions\naws iam put-user-policy --user-name provisioning \\n --policy-name ProvisioningSecretsAccess \\n --policy-document file://kms-policy.json\n\n# Vault: Use AppRole for applications\nvault write auth/approle/role/provisioning \\n token_ttl=1h \\n secret_id_ttl=30m\n\n\n### 4. Audit Logging\n\n\n# KMS: Enable CloudTrail\naws cloudtrail put-event-selectors \\n --trail-name provisioning-trail \\n --event-selectors ReadWriteType=All\n\n# Vault: Check audit logs\nvault audit list\n\n# SOPS: Version control (encrypted)\ngit log -p provisioning/secrets.enc.yaml\n\n\n## Troubleshooting\n\n### SOPS Issues\n\n\n# Test Age decryption\nsops -d provisioning/secrets.enc.yaml\n\n# Verify Age key\nage-keygen -l ~/.age/provisioning\n\n# Regenerate if needed\nrm ~/.age/provisioning\nage-keygen -o ~/.age/provisioning\n\n\n### KMS Issues\n\n\n# Test AWS credentials\naws sts get-caller-identity\n\n# Check KMS key permissions\naws kms describe-key --key-id alias/provisioning\n\n# List secrets\naws secretsmanager list-secrets --filters Name=name,Values=provisioning\n\n\n### Vault Issues\n\n\n# Check Vault status\nvault status\n\n# Test authentication\nvault token lookup\n\n# List secrets\nvault kv list secret/ssh-keys/\n\n# Check audit logs\nvault audit list\nvault read sys/audit\n\n\n## FAQ\n\nQ: Can I use multiple secret sources simultaneously?\nA: Yes, configure multiple sources and set PROVISIONING_SECRET_SOURCE to specify primary. If primary fails, manual fallback to secondary is supported.\n\nQ: What happens if secret retrieval fails?\nA: System logs the error and fails fast. No automatic fallback to local filesystem (for security).\n\nQ: Can I cache SSH keys?\nA: Currently not, keys are retrieved fresh for each operation. Use local caching at OS level (ssh-agent) if needed.\n\nQ: How do I rotate keys?\nA: Update the secret in your configured source (SOPS/KMS/Vault) and retrieve fresh on next operation.\n\nQ: Is local-dev mode secure?\nA: No - it's development only. Production requires SOPS/KMS/Vault.\n\n## Architecture\n\n\nSSH Operation\n ↓\nSecretsManager (Nushell/Rust)\n ↓\n[Detect Source]\n ↓\n┌─────────────────────────────────────┐\n│ SOPS KMS Vault LocalDev\n│ (Encrypted (AWS KMS (Self- (Filesystem\n│ Secrets) Service) Hosted) Dev Only)\n│\n└─────────────────────────────────────┘\n ↓\nReturn SSH Key Path/Content\n ↓\nSSH Operation Completes\n\n\n## Integration with SSH Utilities\n\nSSH operations automatically use secrets manager:\n\n\n# Automatic secret retrieval\nssh-cmd-smart $settings $server false "command" $ip\n# Internally:\n# 1. Determine secret source\n# 2. Retrieve SSH key for server.installer_user@ip\n# 3. Execute SSH with retrieved key\n# 4. Cleanup sensitive data\n\n# Batch operations also integrate\nssh-batch-execute $servers $settings "command"\n# Per-host: Retrieves key → executes → cleans up\n\n\n---\n\nFor Support: See docs/user/TROUBLESHOOTING_GUIDE.md\nFor Integration: See provisioning/core/nulib/lib_provisioning/platform/secrets.nu