provisioning/docs/src/architecture/adr/ADR-007-kms-simplification.md
2026-01-14 04:53:21 +00:00

7.6 KiB

ADR-007: KMS Service Simplification to Age and Cosmian Backends

Status: Accepted Date: 2025-10-08 Deciders: Architecture Team Related: ADR-006 (KMS Service Integration)

Context

The KMS service initially supported 4 backends: HashiCorp Vault, AWS KMS, Age, and Cosmian KMS. This created unnecessary complexity and unclear guidance about which backend to use for different environments.

Problems with 4-Backend Approach

  1. Complexity: Supporting 4 different backends increased maintenance burden
  2. Dependencies: AWS SDK added significant compile time (~30 s) and binary size
  3. Confusion: No clear guidance on which backend to use when
  4. Cloud Lock-in: AWS KMS dependency limited infrastructure flexibility
  5. Operational Overhead: Vault requires server setup even for simple dev environments
  6. Code Duplication: Similar logic implemented 4 different ways

Key Insights

  • Most development work doesn't need server-based KMS
  • Production deployments need enterprise-grade security features
  • Age provides fast, offline encryption perfect for development
  • Cosmian KMS offers confidential computing and zero-knowledge architecture
  • Supporting Vault AND Cosmian is redundant (both are server-based KMS)
  • AWS KMS locks us into AWS infrastructure

Decision

Simplify the KMS service to support only 2 backends:

  1. Age: For development and local testing

    • Fast, offline, no server required
    • Simple key generation with age-keygen
    • X25519 encryption (modern, secure)
    • Perfect for dev/test environments
  2. Cosmian KMS: For production deployments

    • Enterprise-grade key management
    • Confidential computing support (SGX/SEV)
    • Zero-knowledge architecture
    • Server-side key rotation
    • Audit logging and compliance
    • Multi-tenant support

Remove support for:

  • HashiCorp Vault (redundant with Cosmian)
  • AWS KMS (cloud lock-in, complexity)

Consequences

Positive

  1. Simpler Code: 2 backends instead of 4 reduces complexity by 50%
  2. Faster Compilation: Removing AWS SDK saves ~30 seconds compile time
  3. Clear Guidance: Age = dev, Cosmian = prod (no confusion)
  4. Offline Development: Age works without network connectivity
  5. Better Security: Cosmian provides confidential computing (TEE)
  6. No Cloud Lock-in: Not dependent on AWS infrastructure
  7. Easier Testing: Age backend requires no setup
  8. Reduced Dependencies: Fewer external crates to maintain

Negative

  1. Migration Required: Existing Vault/AWS KMS users must migrate
  2. Learning Curve: Teams must learn Age and Cosmian
  3. Cosmian Dependency: Production depends on Cosmian availability
  4. Cost: Cosmian may have licensing costs (cloud or self-hosted)

Neutral

  1. Feature Parity: Cosmian provides all features Vault/AWS had
  2. API Compatibility: Encrypt/decrypt API remains primarily the same
  3. Configuration Change: TOML config structure updated but similar

Implementation

Files Created

  1. src/age/client.rs (167 lines) - Age encryption client
  2. src/age/mod.rs (3 lines) - Age module exports
  3. src/cosmian/client.rs (294 lines) - Cosmian KMS client
  4. src/cosmian/mod.rs (3 lines) - Cosmian module exports
  5. docs/migration/KMS_SIMPLIFICATION.md (500+ lines) - Migration guide

Files Modified

  1. src/lib.rs - Updated exports (age, cosmian instead of aws, vault)
  2. src/types.rs - Updated error types and config enum
  3. src/service.rs - Simplified to 2 backends (180 lines, was 213)
  4. Cargo.toml - Removed AWS deps, added age = "0.10"
  5. README.md - Complete rewrite for new backends
  6. provisioning/config/kms.toml - Simplified configuration

Files Deleted

  1. src/aws/client.rs - AWS KMS client
  2. src/aws/envelope.rs - Envelope encryption helpers
  3. src/aws/mod.rs - AWS module
  4. src/vault/client.rs - Vault client
  5. src/vault/mod.rs - Vault module

Dependencies Changed

Removed:

  • aws-sdk-kms = "1"
  • aws-config = "1"
  • aws-credential-types = "1"
  • aes-gcm = "0.10" (was only for AWS envelope encryption)

Added:

  • age = "0.10"
  • tempfile = "3" (dev dependency for tests)

Kept:

  • All Axum web framework deps
  • reqwest (for Cosmian HTTP API)
  • base64, serde, tokio, etc.

Migration Path

For Development

# 1. Install Age
brew install age  # or apt install age

# 2. Generate keys
age-keygen -o ~/.config/provisioning/age/private_key.txt
age-keygen -y ~/.config/provisioning/age/private_key.txt > ~/.config/provisioning/age/public_key.txt

# 3. Update config to use Age backend
# 4. Re-encrypt development secrets

For Production

# 1. Set up Cosmian KMS (cloud or self-hosted)
# 2. Create master key in Cosmian
# 3. Migrate secrets from Vault/AWS to Cosmian
# 4. Update production config
# 5. Deploy new KMS service

See docs/migration/KMS_SIMPLIFICATION.md for detailed steps.

Alternatives Considered

Alternative 1: Keep All 4 Backends

Pros:

  • No migration required
  • Maximum flexibility

Cons:

  • Continued complexity
  • Maintenance burden
  • Unclear guidance

Rejected: Complexity outweighs benefits

Alternative 2: Only Cosmian (No Age)

Pros:

  • Single backend
  • Enterprise-grade everywhere

Cons:

  • Requires Cosmian server for development
  • Slower dev iteration
  • Network dependency for local dev

Rejected: Development experience matters

Alternative 3: Only Age (No Production Backend)

Pros:

  • Simplest solution
  • No server required

Cons:

  • Not suitable for production
  • No audit logging
  • No key rotation
  • No multi-tenant support

Rejected: Production needs enterprise features

Alternative 4: Age + HashiCorp Vault

Pros:

  • Vault is widely known
  • No Cosmian dependency

Cons:

  • Vault lacks confidential computing
  • Vault server still required
  • No zero-knowledge architecture

Rejected: Cosmian provides better security features

Metrics

Code Reduction

  • Total Lines Removed: ~800 lines (AWS + Vault implementations)
  • Total Lines Added: ~470 lines (Age + Cosmian + docs)
  • Net Reduction: ~330 lines

Dependency Reduction

  • Crates Removed: 4 (aws-sdk-kms, aws-config, aws-credential-types, aes-gcm)
  • Crates Added: 1 (age)
  • Net Reduction: 3 crates

Compilation Time

  • Before: ~90 seconds (with AWS SDK)
  • After: ~60 seconds (without AWS SDK)
  • Improvement: 33% faster

Compliance

Security Considerations

  1. Age Security: X25519 (Curve25519) encryption, modern and secure
  2. Cosmian Security: Confidential computing, zero-knowledge, enterprise-grade
  3. No Regression: Security features maintained or improved
  4. Clear Separation: Dev (Age) never used for production secrets

Testing Requirements

  1. Unit Tests: Both backends have comprehensive test coverage
  2. Integration Tests: Age tests run without external deps
  3. Cosmian Tests: Require test server (marked as #[ignore])
  4. Migration Tests: Verify old configs fail gracefully

References

Notes

  • Age is designed by Filippo Valsorda (Google, Go security team)
  • Cosmian provides FIPS 140-2 Level 3 compliance (when using certified hardware)
  • This decision aligns with project goal of reducing cloud provider dependencies
  • Migration timeline: 6 weeks for full adoption