2025-12-11 21:50:42 +00:00
# ADR-007: KMS Service Simplification to Age and Cosmian Backends
**Status**: Accepted
**Date**: 2025-10-08
**Deciders**: Architecture Team
**Related**: ADR-006 (KMS Service Integration)
## Context
The KMS service initially supported 4 backends: HashiCorp Vault, AWS KMS, Age, and Cosmian KMS. This created unnecessary complexity and unclear guidance about which backend to use for different environments.
### Problems with 4-Backend Approach
1. **Complexity** : Supporting 4 different backends increased maintenance burden
2026-01-08 09:55:37 +00:00
2. **Dependencies** : AWS SDK added significant compile time (~30 s) and binary size
2025-12-11 21:50:42 +00:00
3. **Confusion** : No clear guidance on which backend to use when
4. **Cloud Lock-in** : AWS KMS dependency limited infrastructure flexibility
5. **Operational Overhead** : Vault requires server setup even for simple dev environments
6. **Code Duplication** : Similar logic implemented 4 different ways
### Key Insights
- Most development work doesn't need server-based KMS
- Production deployments need enterprise-grade security features
- Age provides fast, offline encryption perfect for development
- Cosmian KMS offers confidential computing and zero-knowledge architecture
- Supporting Vault AND Cosmian is redundant (both are server-based KMS)
- AWS KMS locks us into AWS infrastructure
## Decision
Simplify the KMS service to support only 2 backends:
1. **Age** : For development and local testing
- Fast, offline, no server required
- Simple key generation with `age-keygen`
- X25519 encryption (modern, secure)
- Perfect for dev/test environments
2. **Cosmian KMS** : For production deployments
- Enterprise-grade key management
- Confidential computing support (SGX/SEV)
- Zero-knowledge architecture
- Server-side key rotation
- Audit logging and compliance
- Multi-tenant support
Remove support for:
2026-01-08 09:55:37 +00:00
2025-12-11 21:50:42 +00:00
- ❌ HashiCorp Vault (redundant with Cosmian)
- ❌ AWS KMS (cloud lock-in, complexity)
## Consequences
### Positive
1. **Simpler Code** : 2 backends instead of 4 reduces complexity by 50%
2. **Faster Compilation** : Removing AWS SDK saves ~30 seconds compile time
3. **Clear Guidance** : Age = dev, Cosmian = prod (no confusion)
4. **Offline Development** : Age works without network connectivity
5. **Better Security** : Cosmian provides confidential computing (TEE)
6. **No Cloud Lock-in** : Not dependent on AWS infrastructure
7. **Easier Testing** : Age backend requires no setup
8. **Reduced Dependencies** : Fewer external crates to maintain
### Negative
1. **Migration Required** : Existing Vault/AWS KMS users must migrate
2. **Learning Curve** : Teams must learn Age and Cosmian
3. **Cosmian Dependency** : Production depends on Cosmian availability
4. **Cost** : Cosmian may have licensing costs (cloud or self-hosted)
### Neutral
1. **Feature Parity** : Cosmian provides all features Vault/AWS had
2026-01-08 09:55:37 +00:00
2. **API Compatibility** : Encrypt/decrypt API remains primarily the same
2025-12-11 21:50:42 +00:00
3. **Configuration Change** : TOML config structure updated but similar
## Implementation
### Files Created
1. `src/age/client.rs` (167 lines) - Age encryption client
2. `src/age/mod.rs` (3 lines) - Age module exports
3. `src/cosmian/client.rs` (294 lines) - Cosmian KMS client
4. `src/cosmian/mod.rs` (3 lines) - Cosmian module exports
5. `docs/migration/KMS_SIMPLIFICATION.md` (500+ lines) - Migration guide
### Files Modified
1. `src/lib.rs` - Updated exports (age, cosmian instead of aws, vault)
2. `src/types.rs` - Updated error types and config enum
3. `src/service.rs` - Simplified to 2 backends (180 lines, was 213)
4. `Cargo.toml` - Removed AWS deps, added `age = "0.10"`
5. `README.md` - Complete rewrite for new backends
6. `provisioning/config/kms.toml` - Simplified configuration
### Files Deleted
1. `src/aws/client.rs` - AWS KMS client
2. `src/aws/envelope.rs` - Envelope encryption helpers
3. `src/aws/mod.rs` - AWS module
4. `src/vault/client.rs` - Vault client
5. `src/vault/mod.rs` - Vault module
### Dependencies Changed
**Removed**:
2026-01-08 09:55:37 +00:00
2025-12-11 21:50:42 +00:00
- `aws-sdk-kms = "1"`
- `aws-config = "1"`
- `aws-credential-types = "1"`
- `aes-gcm = "0.10"` (was only for AWS envelope encryption)
**Added**:
2026-01-08 09:55:37 +00:00
2025-12-11 21:50:42 +00:00
- `age = "0.10"`
- `tempfile = "3"` (dev dependency for tests)
**Kept**:
2026-01-08 09:55:37 +00:00
2025-12-11 21:50:42 +00:00
- All Axum web framework deps
- `reqwest` (for Cosmian HTTP API)
- `base64` , `serde` , `tokio` , etc.
## Migration Path
### For Development
```bash
# 1. Install Age
brew install age # or apt install age
# 2. Generate keys
age-keygen -o ~/.config/provisioning/age/private_key.txt
age-keygen -y ~/.config/provisioning/age/private_key.txt > ~/.config/provisioning/age/public_key.txt
# 3. Update config to use Age backend
# 4. Re-encrypt development secrets
```
### For Production
```bash
# 1. Set up Cosmian KMS (cloud or self-hosted)
# 2. Create master key in Cosmian
# 3. Migrate secrets from Vault/AWS to Cosmian
# 4. Update production config
# 5. Deploy new KMS service
```
See `docs/migration/KMS_SIMPLIFICATION.md` for detailed steps.
## Alternatives Considered
### Alternative 1: Keep All 4 Backends
**Pros**:
2026-01-08 09:55:37 +00:00
2025-12-11 21:50:42 +00:00
- No migration required
- Maximum flexibility
**Cons**:
2026-01-08 09:55:37 +00:00
2025-12-11 21:50:42 +00:00
- Continued complexity
- Maintenance burden
- Unclear guidance
**Rejected**: Complexity outweighs benefits
### Alternative 2: Only Cosmian (No Age)
**Pros**:
2026-01-08 09:55:37 +00:00
2025-12-11 21:50:42 +00:00
- Single backend
- Enterprise-grade everywhere
**Cons**:
2026-01-08 09:55:37 +00:00
2025-12-11 21:50:42 +00:00
- Requires Cosmian server for development
- Slower dev iteration
- Network dependency for local dev
**Rejected**: Development experience matters
### Alternative 3: Only Age (No Production Backend)
**Pros**:
2026-01-08 09:55:37 +00:00
2025-12-11 21:50:42 +00:00
- Simplest solution
- No server required
**Cons**:
2026-01-08 09:55:37 +00:00
2025-12-11 21:50:42 +00:00
- Not suitable for production
- No audit logging
- No key rotation
- No multi-tenant support
**Rejected**: Production needs enterprise features
### Alternative 4: Age + HashiCorp Vault
**Pros**:
2026-01-08 09:55:37 +00:00
2025-12-11 21:50:42 +00:00
- Vault is widely known
- No Cosmian dependency
**Cons**:
2026-01-08 09:55:37 +00:00
2025-12-11 21:50:42 +00:00
- Vault lacks confidential computing
- Vault server still required
- No zero-knowledge architecture
**Rejected**: Cosmian provides better security features
## Metrics
### Code Reduction
- **Total Lines Removed**: ~800 lines (AWS + Vault implementations)
- **Total Lines Added**: ~470 lines (Age + Cosmian + docs)
- **Net Reduction**: ~330 lines
### Dependency Reduction
- **Crates Removed**: 4 (aws-sdk-kms, aws-config, aws-credential-types, aes-gcm)
- **Crates Added**: 1 (age)
- **Net Reduction**: 3 crates
### Compilation Time
- **Before**: ~90 seconds (with AWS SDK)
- **After**: ~60 seconds (without AWS SDK)
- **Improvement**: 33% faster
## Compliance
### Security Considerations
1. **Age Security** : X25519 (Curve25519) encryption, modern and secure
2. **Cosmian Security** : Confidential computing, zero-knowledge, enterprise-grade
3. **No Regression** : Security features maintained or improved
4. **Clear Separation** : Dev (Age) never used for production secrets
### Testing Requirements
1. **Unit Tests** : Both backends have comprehensive test coverage
2. **Integration Tests** : Age tests run without external deps
3. **Cosmian Tests** : Require test server (marked as `#[ignore]` )
4. **Migration Tests** : Verify old configs fail gracefully
## References
- [Age Encryption ](https://github.com/FiloSottile/age ) - Modern encryption tool
- [Cosmian KMS ](https://cosmian.com/kms/ ) - Enterprise KMS with confidential computing
- [ADR-006 ](ADR-006-provisioning-cli-refactoring.md ) - Previous KMS integration
- [Migration Guide ](../migration/KMS_SIMPLIFICATION.md ) - Detailed migration steps
## Notes
- Age is designed by Filippo Valsorda (Google, Go security team)
- Cosmian provides FIPS 140-2 Level 3 compliance (when using certified hardware)
- This decision aligns with project goal of reducing cloud provider dependencies
- Migration timeline: 6 weeks for full adoption