# ADR-007: KMS Service Simplification to Age and Cosmian Backends **Status**: Accepted **Date**: 2025-10-08 **Deciders**: Architecture Team **Related**: ADR-006 (KMS Service Integration) ## Context The KMS service initially supported 4 backends: HashiCorp Vault, AWS KMS, Age, and Cosmian KMS. This created unnecessary complexity and unclear guidance about which backend to use for different environments. ### Problems with 4-Backend Approach 1. **Complexity**: Supporting 4 different backends increased maintenance burden 2. **Dependencies**: AWS SDK added significant compile time (~30 s) and binary size 3. **Confusion**: No clear guidance on which backend to use when 4. **Cloud Lock-in**: AWS KMS dependency limited infrastructure flexibility 5. **Operational Overhead**: Vault requires server setup even for simple dev environments 6. **Code Duplication**: Similar logic implemented 4 different ways ### Key Insights - Most development work doesn't need server-based KMS - Production deployments need enterprise-grade security features - Age provides fast, offline encryption perfect for development - Cosmian KMS offers confidential computing and zero-knowledge architecture - Supporting Vault AND Cosmian is redundant (both are server-based KMS) - AWS KMS locks us into AWS infrastructure ## Decision Simplify the KMS service to support only 2 backends: 1. **Age**: For development and local testing - Fast, offline, no server required - Simple key generation with `age-keygen` - X25519 encryption (modern, secure) - Perfect for dev/test environments 2. **Cosmian KMS**: For production deployments - Enterprise-grade key management - Confidential computing support (SGX/SEV) - Zero-knowledge architecture - Server-side key rotation - Audit logging and compliance - Multi-tenant support Remove support for: - ❌ HashiCorp Vault (redundant with Cosmian) - ❌ AWS KMS (cloud lock-in, complexity) ## Consequences ### Positive 1. **Simpler Code**: 2 backends instead of 4 reduces complexity by 50% 2. **Faster Compilation**: Removing AWS SDK saves ~30 seconds compile time 3. **Clear Guidance**: Age = dev, Cosmian = prod (no confusion) 4. **Offline Development**: Age works without network connectivity 5. **Better Security**: Cosmian provides confidential computing (TEE) 6. **No Cloud Lock-in**: Not dependent on AWS infrastructure 7. **Easier Testing**: Age backend requires no setup 8. **Reduced Dependencies**: Fewer external crates to maintain ### Negative 1. **Migration Required**: Existing Vault/AWS KMS users must migrate 2. **Learning Curve**: Teams must learn Age and Cosmian 3. **Cosmian Dependency**: Production depends on Cosmian availability 4. **Cost**: Cosmian may have licensing costs (cloud or self-hosted) ### Neutral 1. **Feature Parity**: Cosmian provides all features Vault/AWS had 2. **API Compatibility**: Encrypt/decrypt API remains primarily the same 3. **Configuration Change**: TOML config structure updated but similar ## Implementation ### Files Created 1. `src/age/client.rs` (167 lines) - Age encryption client 2. `src/age/mod.rs` (3 lines) - Age module exports 3. `src/cosmian/client.rs` (294 lines) - Cosmian KMS client 4. `src/cosmian/mod.rs` (3 lines) - Cosmian module exports 5. `docs/migration/KMS_SIMPLIFICATION.md` (500+ lines) - Migration guide ### Files Modified 1. `src/lib.rs` - Updated exports (age, cosmian instead of aws, vault) 2. `src/types.rs` - Updated error types and config enum 3. `src/service.rs` - Simplified to 2 backends (180 lines, was 213) 4. `Cargo.toml` - Removed AWS deps, added `age = "0.10"` 5. `README.md` - Complete rewrite for new backends 6. `provisioning/config/kms.toml` - Simplified configuration ### Files Deleted 1. `src/aws/client.rs` - AWS KMS client 2. `src/aws/envelope.rs` - Envelope encryption helpers 3. `src/aws/mod.rs` - AWS module 4. `src/vault/client.rs` - Vault client 5. `src/vault/mod.rs` - Vault module ### Dependencies Changed **Removed**: - `aws-sdk-kms = "1"` - `aws-config = "1"` - `aws-credential-types = "1"` - `aes-gcm = "0.10"` (was only for AWS envelope encryption) **Added**: - `age = "0.10"` - `tempfile = "3"` (dev dependency for tests) **Kept**: - All Axum web framework deps - `reqwest` (for Cosmian HTTP API) - `base64`, `serde`, `tokio`, etc. ## Migration Path ### For Development ```text # 1. Install Age brew install age # or apt install age # 2. Generate keys age-keygen -o ~/.config/provisioning/age/private_key.txt age-keygen -y ~/.config/provisioning/age/private_key.txt > ~/.config/provisioning/age/public_key.txt # 3. Update config to use Age backend # 4. Re-encrypt development secrets ``` ### For Production ```text # 1. Set up Cosmian KMS (cloud or self-hosted) # 2. Create master key in Cosmian # 3. Migrate secrets from Vault/AWS to Cosmian # 4. Update production config # 5. Deploy new KMS service ``` See `docs/migration/KMS_SIMPLIFICATION.md` for detailed steps. ## Alternatives Considered ### Alternative 1: Keep All 4 Backends **Pros**: - No migration required - Maximum flexibility **Cons**: - Continued complexity - Maintenance burden - Unclear guidance **Rejected**: Complexity outweighs benefits ### Alternative 2: Only Cosmian (No Age) **Pros**: - Single backend - Enterprise-grade everywhere **Cons**: - Requires Cosmian server for development - Slower dev iteration - Network dependency for local dev **Rejected**: Development experience matters ### Alternative 3: Only Age (No Production Backend) **Pros**: - Simplest solution - No server required **Cons**: - Not suitable for production - No audit logging - No key rotation - No multi-tenant support **Rejected**: Production needs enterprise features ### Alternative 4: Age + HashiCorp Vault **Pros**: - Vault is widely known - No Cosmian dependency **Cons**: - Vault lacks confidential computing - Vault server still required - No zero-knowledge architecture **Rejected**: Cosmian provides better security features ## Metrics ### Code Reduction - **Total Lines Removed**: ~800 lines (AWS + Vault implementations) - **Total Lines Added**: ~470 lines (Age + Cosmian + docs) - **Net Reduction**: ~330 lines ### Dependency Reduction - **Crates Removed**: 4 (aws-sdk-kms, aws-config, aws-credential-types, aes-gcm) - **Crates Added**: 1 (age) - **Net Reduction**: 3 crates ### Compilation Time - **Before**: ~90 seconds (with AWS SDK) - **After**: ~60 seconds (without AWS SDK) - **Improvement**: 33% faster ## Compliance ### Security Considerations 1. **Age Security**: X25519 (Curve25519) encryption, modern and secure 2. **Cosmian Security**: Confidential computing, zero-knowledge, enterprise-grade 3. **No Regression**: Security features maintained or improved 4. **Clear Separation**: Dev (Age) never used for production secrets ### Testing Requirements 1. **Unit Tests**: Both backends have comprehensive test coverage 2. **Integration Tests**: Age tests run without external deps 3. **Cosmian Tests**: Require test server (marked as `#[ignore]`) 4. **Migration Tests**: Verify old configs fail gracefully ## References - [Age Encryption](https://github.com/FiloSottile/age) - Modern encryption tool - [Cosmian KMS](https://cosmian.com/kms/) - Enterprise KMS with confidential computing - [ADR-006](adr-006-provisioning-cli-refactoring.md) - Previous KMS integration - [Migration Guide](../migration/KMS_SIMPLIFICATION.md) - Detailed migration steps ## Notes - Age is designed by Filippo Valsorda (Google, Go security team) - Cosmian provides FIPS 140-2 Level 3 compliance (when using certified hardware) - This decision aligns with project goal of reducing cloud provider dependencies - Migration timeline: 6 weeks for full adoption