2025-12-26 15:13:36 +00:00

1016 lines
28 KiB
Markdown

# SecretumVault Architecture
Complete system architecture, design decisions, and component interactions.
## Table of Contents
1. [System Overview](#system-overview)
2. [Core Components](#core-components)
3. [Request Flow](#request-flow)
4. [Configuration-Driven Design](#configuration-driven-design)
5. [Registry Pattern](#registry-pattern)
6. [Storage Layer](#storage-layer)
7. [Cryptography Layer](#cryptography-layer)
8. [Secrets Engines](#secrets-engines)
9. [Authorization & Policies](#authorization--policies)
10. [Deployment Architecture](#deployment-architecture)
---
## System Overview
SecretumVault is a **config-driven, async-first secrets management system** built on:
- **Rust + Tokio**: Type-safe async runtime
- **Axum**: High-performance HTTP framework
- **Trait-based polymorphism**: Pluggable backends
- **Registry pattern**: Type-safe factory dispatch
- **Cedar**: Attribute-based access control (ABAC)
- **Post-quantum cryptography**: Future-proof security
### Design Philosophy
```
┌─────────────────────────────────────────────────────┐
│ Config-Driven: WHAT to use │
│ (backend selection, engine mounting) │
└────────────────┬────────────────────────────────────┘
┌─────────────────────────────────────────────────────┐
│ Registry Pattern: HOW to create it │
│ (type-safe dispatch from config string) │
└────────────────┬────────────────────────────────────┘
┌─────────────────────────────────────────────────────┐
│ Trait Abstraction: INTERFACE definition │
│ (StorageBackend, CryptoBackend, Engine) │
└────────────────┬────────────────────────────────────┘
┌─────────────────────────────────────────────────────┐
│ Concrete Implementations: ACTUAL code │
│ (etcd, PostgreSQL, OpenSSL, AWS-LC) │
└─────────────────────────────────────────────────────┘
```
**Benefit**: Add new backend without modifying existing code—only implement trait + update config.
---
## Core Components
### VaultCore
Central coordinator managing all vault operations.
```rust
pub struct VaultCore {
// Storage for encrypted secrets and metadata
pub storage: Arc<dyn StorageBackend>,
// Cryptographic operations (encrypt/decrypt/sign/verify)
pub crypto: Arc<dyn CryptoBackend>,
// Authentication tokens and TTL management
pub auth_manager: Arc<AuthManager>,
// Cedar policy engine for fine-grained access control
pub cedar_engine: Arc<CedarEngine>,
// Mounted secret engines (KV, Transit, PKI, Database, etc.)
pub engines: HashMap<String, Box<dyn Engine>>,
// Seal/unseal state and master key encryption
pub seal_manager: Arc<SealManager>,
// Metrics collection (Prometheus-compatible)
pub metrics: Arc<Metrics>,
// Configuration (static, loaded once at startup)
pub config: VaultConfig,
}
```
**Initialization**:
```rust
impl VaultCore {
pub async fn from_config(config: VaultConfig) -> Result<Self> {
// 1. Load and validate configuration
config.validate()?;
// 2. Create storage backend from config
let storage = StorageRegistry::create(&config.storage).await?;
// 3. Create crypto backend from config
let crypto = CryptoRegistry::create(&config.vault.crypto_backend, &config.crypto)?;
// 4. Initialize seal/unseal manager
let seal_manager = SealManager::new(crypto.clone());
// 5. Mount secret engines from config
let mut engines = HashMap::new();
if let Some(kv_cfg) = &config.engines.kv {
engines.insert(
kv_cfg.path.clone(),
Box::new(KVEngine::new(kv_cfg, storage.clone())?)
);
}
// 6. Create auth manager and Cedar engine
let auth_manager = AuthManager::new(storage.clone());
let cedar_engine = CedarEngine::new(&config.auth)?;
Ok(Self {
storage,
crypto,
auth_manager,
cedar_engine,
engines,
seal_manager,
metrics: Arc::new(Metrics::new()),
config,
})
}
}
```
### API Server
Axum-based HTTP server with middleware stack.
```
HTTP Request
[Axum Router]
[Auth Middleware] - Validate X-Vault-Token
[Cedar Middleware] - Evaluate policy (permit/forbid)
[Request Handler] - Route to appropriate engine
[Engine Implementation] - Process request
[Storage/Crypto] - Persist/encrypt
HTTP Response
[Metrics] - Record operation
[Audit Log] - Log to storage
```
**Routing**:
```rust
pub fn build_router(vault: Arc<VaultCore>) -> Router {
let mut router = Router::new()
// System endpoints
.route("/v1/sys/init", post(sys::init))
.route("/v1/sys/unseal", post(sys::unseal))
.route("/v1/sys/health", get(sys::health))
.route("/v1/sys/seal-status", get(sys::seal_status));
// Mount dynamic routes from engines
for (path, engine) in &vault.engines {
router = router.nest(&format!("/v1/{}", path), engine.routes());
}
router
.layer(middleware::from_fn_with_state(vault.clone(), auth_middleware))
.layer(middleware::from_fn_with_state(vault.clone(), cedar_authz_middleware))
.with_state(vault)
}
```
---
## Request Flow
### Secret Read Request
```
1. Client:
curl -H "X-Vault-Token: $TOKEN" \
http://localhost:8200/v1/secret/data/myapp
2. Server receives request
3. Auth Middleware:
- Extract token from header
- Lookup token in storage
- Validate TTL (not expired)
- Extract token metadata (principal, ttl, policies)
4. Cedar Middleware:
- Build context: principal={token_id}, action=read, resource=/secret/data/myapp
- Evaluate policies: cedar_engine.evaluate(context)
- Result: permit / forbid
- If forbid: Return 403 Forbidden
5. Route Handler:
- Parse request path: /v1/secret/data/myapp
- Find mounted engine: KVEngine at /secret/
- Delegate to engine.handle_request()
6. KV Engine:
- Extract secret path: myapp
- Call storage.get("secret:myapp")
7. Storage Backend (etcd/postgres/etc):
- Lookup encrypted secret blob
- Return to engine
8. KV Engine (decrypt):
- Get master key from seal_manager
- Call crypto.decrypt(blob, master_key)
- Return plaintext metadata + versions
9. Response:
- Build JSON response
- Record metrics.secrets_read.inc()
- Log to audit: {principal, action, resource, result}
- Return 200 OK with secret data
```
### Secret Write Request
```
Similar to read, but:
1. Auth → Cedar policy evaluation (write policy)
2. Engine handler parses request body (secret data)
3. Encryption:
- Get master key from seal_manager
- crypto.encrypt(plaintext, master_key) → ciphertext
4. Storage: store(ciphertext, metadata)
5. Return 201 Created or 204 No Content
6. Metrics/Audit: Record write operation
```
---
## Configuration-Driven Design
All runtime behavior determined by `svault.toml`:
### Configuration Hierarchy
```
VaultConfig (root)
├── [vault] section
│ ├── crypto_backend = "openssl"
│ └── (global settings)
├── [server] section
│ ├── address = "0.0.0.0"
│ ├── port = 8200
│ └── (TLS settings)
├── [storage] section
│ ├── backend = "etcd"
│ └── [storage.etcd]
│ └── endpoints = ["http://localhost:2379"]
├── [crypto] section
│ └── (crypto-specific settings)
├── [seal] section
│ ├── seal_type = "shamir"
│ └── [seal.shamir]
│ ├── threshold = 2
│ └── shares = 3
├── [engines] section
│ ├── [engines.kv]
│ │ ├── path = "secret/"
│ │ └── versioned = true
│ ├── [engines.transit]
│ │ └── path = "transit/"
│ └── (other engines)
├── [logging] section
│ ├── level = "info"
│ └── format = "json"
├── [telemetry] section
│ ├── prometheus_port = 9090
│ └── enable_trace = false
└── [auth] section
└── default_ttl = 24
```
### Configuration Validation
Validation at startup (fail-fast):
```rust
impl VaultConfig {
pub fn validate(&self) -> Result<()> {
// 1. Check backend availability
if !CryptoRegistry::is_available(&self.vault.crypto_backend) {
return Err(ConfigError::UnavailableBackend(backend_name));
}
// 2. Check path collisions
let mut paths = HashSet::new();
for engine_cfg in self.engines.all_engines() {
if !paths.insert(engine_cfg.path.clone()) {
return Err(ConfigError::DuplicatePath(engine_cfg.path));
}
}
// 3. Validate seal threshold
if self.seal.threshold > self.seal.shares {
return Err(ConfigError::InvalidSealConfig);
}
// 4. Check required fields
if self.storage.endpoints.is_empty() {
return Err(ConfigError::MissingField("endpoints"));
}
Ok(())
}
}
```
---
## Registry Pattern
Type-safe backend factory pattern.
### Storage Registry
```rust
pub struct StorageRegistry;
impl StorageRegistry {
pub async fn create(config: &StorageConfig) -> Result<Arc<dyn StorageBackend>> {
match config.backend.as_str() {
"filesystem" => {
Ok(Arc::new(FilesystemBackend::new(&config)?))
}
"etcd" => {
Ok(Arc::new(EtcdBackend::new(&config.etcd).await?))
}
"surrealdb" => {
Ok(Arc::new(SurrealDBBackend::new(&config.surrealdb).await?))
}
"postgresql" => {
Ok(Arc::new(PostgreSQLBackend::new(&config.postgresql).await?))
}
unknown => Err(ConfigError::UnknownBackend(unknown.to_string()))
}
}
}
```
### Crypto Registry
```rust
pub struct CryptoRegistry;
impl CryptoRegistry {
pub fn create(backend: &str, config: &CryptoConfig) -> Result<Arc<dyn CryptoBackend>> {
match backend {
"openssl" => Ok(Arc::new(OpenSSLBackend::new()?)),
"aws-lc" => {
#[cfg(feature = "aws-lc")]
return Ok(Arc::new(AwsLcBackend::new()?));
#[cfg(not(feature = "aws-lc"))]
return Err(ConfigError::FeatureNotEnabled("aws-lc"));
}
"rustcrypto" => {
#[cfg(feature = "rustcrypto")]
return Ok(Arc::new(RustCryptoBackend::new()?));
#[cfg(not(feature = "rustcrypto"))]
return Err(ConfigError::FeatureNotEnabled("rustcrypto"));
}
unknown => Err(ConfigError::UnknownBackend(unknown.to_string()))
}
}
}
```
### Engine Registry
```rust
pub struct EngineRegistry;
impl EngineRegistry {
pub fn mount_engines(
config: &EnginesConfig,
vault: &Arc<VaultCore>
) -> Result<HashMap<String, Box<dyn Engine>>> {
let mut engines = HashMap::new();
// Mount KV engine
if let Some(kv_cfg) = &config.kv {
engines.insert(
kv_cfg.path.clone(),
Box::new(KVEngine::new(kv_cfg, vault.storage.clone())?)
as Box<dyn Engine>
);
}
// Mount Transit engine
if let Some(transit_cfg) = &config.transit {
engines.insert(
transit_cfg.path.clone(),
Box::new(TransitEngine::new(transit_cfg, vault.crypto.clone())?)
as Box<dyn Engine>
);
}
// Mount PKI engine
if let Some(pki_cfg) = &config.pki {
engines.insert(
pki_cfg.path.clone(),
Box::new(PKIEngine::new(pki_cfg, vault.crypto.clone())?)
as Box<dyn Engine>
);
}
// Mount Database engine
if let Some(db_cfg) = &config.database {
engines.insert(
db_cfg.path.clone(),
Box::new(DatabaseEngine::new(db_cfg, vault.storage.clone())?)
as Box<dyn Engine>
);
}
Ok(engines)
}
}
```
---
## Storage Layer
### StorageBackend Trait
```rust
pub trait StorageBackend: Send + Sync {
// Key-value operations
async fn get(&self, key: &str) -> StorageResult<Option<Vec<u8>>>;
async fn set(&self, key: &str, value: Vec<u8>) -> StorageResult<()>;
async fn delete(&self, key: &str) -> StorageResult<()>;
// Listing and querying
async fn list(&self, prefix: &str) -> StorageResult<Vec<String>>;
async fn exists(&self, key: &str) -> StorageResult<bool>;
// Atomic operations
async fn cas(&self, key: &str, old: Option<Vec<u8>>, new: Vec<u8>)
-> StorageResult<bool>;
// Transactions
async fn transaction(&self, ops: Vec<StorageOp>)
-> StorageResult<Vec<StorageResult<()>>>;
}
```
### Storage Key Organization
Keys are namespaced by purpose:
```
Direct secret storage:
secret:metadata:myapp → Metadata (path, versions, timestamps)
secret:v1:myapp → Version 1 (encrypted data)
secret:v2:myapp → Version 2 (encrypted data)
Token storage:
auth:tokens:token_abc123 → Token metadata (TTL, policies)
auth:leases:lease_id → Active lease info
Engine-specific:
pki:roots:root-ca → PKI root certificate
pki:roles:my-role → PKI role configuration
db:credentials:postgres-prod → Generated credentials
transit:keys:my-key → Transit encryption key
Internal:
vault:config:shamir → Shamir threshold and shares
vault:master:encrypted_key → Encrypted master key
```
### Concurrent Access
Storage operations are atomic but don't use distributed locks:
```
Write Operation:
1. Read current value (with version)
2. Modify in-memory
3. CAS (compare-and-swap) write:
- If version matches → Write succeeds
- If version mismatch → Retry from step 1
Read Operation:
- Simple get() call
- No locking, readers don't block writers
```
---
## Cryptography Layer
### CryptoBackend Trait
```rust
pub trait CryptoBackend: Send + Sync {
// Symmetric encryption (AES-256-GCM, ChaCha20-Poly1305)
async fn encrypt(&self, plaintext: &[u8], aad: &[u8])
-> CryptoResult<Ciphertext>;
async fn decrypt(&self, ciphertext: &Ciphertext, aad: &[u8])
-> CryptoResult<Vec<u8>>;
// Key generation
async fn generate_keypair(&self, algorithm: KeyAlgorithm)
-> CryptoResult<KeyPair>;
// Signing and verification (if supported)
async fn sign(&self, data: &[u8], key_id: &str)
-> CryptoResult<Signature>;
async fn verify(&self, data: &[u8], signature: &Signature)
-> CryptoResult<bool>;
// Hash operations
async fn hash(&self, data: &[u8], algorithm: HashAlgorithm)
-> CryptoResult<Vec<u8>>;
}
```
### Master Key Encryption
All secrets encrypted with master key:
```
Master Key (from Shamir SSS)
Encrypt with NIST SP 800-38D (GCM mode)
Ciphertext + IV + Tag stored in encrypted_secret
```
### Post-Quantum Support
Feature-gated post-quantum algorithms:
```rust
#[cfg(feature = "pqc")]
pub enum KeyAlgorithm {
// Classical
Rsa2048, Rsa4096,
EcdsaP256, EcdsaP384, EcdsaP521,
// Post-quantum (ML-KEM for key exchange)
MlKem768,
// Post-quantum (ML-DSA for signatures)
MlDsa65,
}
#[cfg(not(feature = "pqc"))]
pub enum KeyAlgorithm {
// Classical only
Rsa2048, Rsa4096,
EcdsaP256, EcdsaP384, EcdsaP521,
}
```
---
## Secrets Engines
### Engine Trait
```rust
pub trait Engine: Send + Sync {
// Handle HTTP request for this engine
async fn handle_request(&self, req: EngineRequest)
-> EngineResult<EngineResponse>;
// Mount point (e.g., "secret/", "transit/")
fn mount_path(&self) -> &str;
// Engine type (for metrics and logging)
fn engine_type(&self) -> &str;
// Build Axum router for this engine's routes
fn routes(&self) -> Router;
}
```
### Engine Request Flow
```
HTTP Request: POST /v1/secret/data/myapp
Router matches /secret/ prefix
KVEngine::routes() router handles /data/myapp
KVEngine::handle_request() called
KVEngine processes:
- Parse request body
- Validate against storage
- Encrypt/decrypt as needed
- Call storage backend
- Return response
HTTP Response
```
### KV Engine (Versioned)
```rust
pub struct KVEngine {
storage: Arc<dyn StorageBackend>,
config: KVEngineConfig,
crypto: Arc<dyn CryptoBackend>,
}
impl KVEngine {
// Handle read request
pub async fn read(&self, path: &str) -> EngineResult<SecretMetadata> {
// 1. Get secret metadata
let metadata_key = format!("{}secret:metadata:{}", self.config.path, path);
let encrypted = self.storage.get(&metadata_key).await?;
// 2. Decrypt metadata
let plaintext = self.crypto.decrypt(&encrypted, b"").await?;
let metadata: SecretMetadata = serde_json::from_slice(&plaintext)?;
Ok(metadata)
}
// Handle write request
pub async fn write(&self, path: &str, data: Value)
-> EngineResult<()> {
// 1. Get or create metadata
let metadata_key = format!("{}secret:metadata:{}", self.config.path, path);
let mut metadata = self.read_metadata(&metadata_key).await?;
// 2. Create new version
let version = metadata.versions.len() + 1;
let version_key = format!("{}secret:v{}:{}", self.config.path, version, path);
// 3. Encrypt version data
let plaintext = serde_json::to_vec(&data)?;
let encrypted = self.crypto.encrypt(&plaintext, b"").await?;
// 4. Store version and update metadata
self.storage.set(&version_key, encrypted).await?;
metadata.update(version, Utc::now());
// 5. Store metadata
let metadata_bytes = serde_json::to_vec(&metadata)?;
let encrypted_metadata = self.crypto.encrypt(&metadata_bytes, b"").await?;
self.storage.set(&metadata_key, encrypted_metadata).await?;
Ok(())
}
}
```
---
## Authorization & Policies
### Cedar Integration
Cedar is AWS's open-source policy language:
```cedar
permit (
principal == User::"alice",
action == Action::"read",
resource == Secret::"secret/myapp"
) when {
context.ip_address.isIpv4("10.0.0.0", 16)
};
```
### Policy Evaluation Flow
```
HTTP Request
Extract principal: X-Vault-Token
Build Cedar context:
principal = Token(token_id, policies=[...])
action = "read"
resource = "/secret/data/myapp"
context = {
ip_address = "10.0.20.5",
timestamp = "2025-12-21T10:30:00Z"
}
Cedar engine evaluates: evaluate(context)
Decision:
- Permit → Proceed to engine
- Deny → Return 403 Forbidden
- NotApplicable → Default deny
```
### Token Lifecycle
```
Create:
1. Generate random token ID (32 bytes)
2. Create metadata: {policies, ttl, created_at, renewable}
3. Store encrypted in storage: auth:tokens:token_id
4. Return token to client
Validate:
1. Extract token from request header
2. Lookup in storage
3. Check TTL: if expired → invalid
4. Extract policies and principal info
Renew:
1. Validate token (not expired)
2. Update TTL: expires_at = now + renewal_period
3. Update in storage
Revoke:
1. Delete from storage
2. Invalidate any active leases
```
---
## Deployment Architecture
### Docker Compose (Local Development)
```
┌─────────────────────────────────────────────────────┐
│ Docker Compose Network │
│ (vault-network) │
├──────────────┬──────────────┬───────────┬────────────┤
│ │ │ │ │
▼ ▼ ▼ ▼ ▼
[vault:8200] [etcd:2379] [surrealdb:8000] [postgres:5432] [prometheus:9090]
(server) (storage) (alt-storage) (alt-storage) (monitoring)
```
### Kubernetes Cluster
```
┌────────────────────────────────────────────────────┐
│ Kubernetes Cluster │
│ │
│ ┌──────────────────────────────────────────────┐ │
│ │ secretumvault Namespace │ │
│ │ │ │
│ │ ┌────────┐ ┌────────┐ ┌─────────────┐ │ │
│ │ │vault:8200 │etcd:2379 │prometheus:9090 │ │
│ │ │Deployment │StatefulSet │Deployment │ │
│ │ │(1 replica)│(3 replicas)│(1 replica) │ │
│ │ └────────┘ └────────┘ └─────────────┘ │ │
│ │ ↓ ↓ │ │
│ │ [Service] [Headless] │ │
│ │ vault:8200 etcd:2379 │ │
│ │ (peer discovery) │ │
│ │ │ │
│ │ [ConfigMap] vault-config (svault.toml) │ │
│ │ [RBAC] ServiceAccount, ClusterRole │ │
│ │ [PVC] Persistent storage for etcd │ │
│ │ │ │
│ └──────────────────────────────────────────────┘ │
│ │
└────────────────────────────────────────────────────┘
```
### Helm Chart Structure
```
helm/secretumvault/
├── Chart.yaml # Chart metadata
├── values.yaml # Default values (90+ options)
├── templates/
│ ├── _helpers.tpl # Template functions
│ ├── deployment.yaml # Vault deployment
│ ├── service.yaml # Services
│ ├── configmap.yaml # Configuration
│ └── rbac.yaml # Security
```
---
## Data Flow Diagram
### Secret Storage Flow
```
User Request:
{"username": "admin", "password": "secret123"}
Auth Middleware validates token
Cedar policy evaluates (permit/forbid)
KV Engine write handler:
1. Parse request body
2. Generate metadata (created_at, version)
3. Serialize to JSON
Crypto Backend:
plaintext = b'{"username": "admin", ...}'
master_key = seal_manager.unseal()
ciphertext = aes_256_gcm.encrypt(plaintext, master_key)
→ ciphertext = [nonce(12B) | ciphertext | tag(16B)]
Storage Backend (etcd/postgres):
storage.set(
key = "secret:v1:myapp",
value = ciphertext
)
Metrics recorded:
vault_secrets_stored.inc()
Audit logged:
{
timestamp: "2025-12-21T10:30:00Z",
principal: "user:alice",
action: "write",
resource: "/secret/data/myapp",
result: "success"
}
```
### Secret Retrieval Flow
```
User Request:
GET /v1/secret/data/myapp
Header: X-Vault-Token: token_abc123
Auth Middleware:
1. Extract token from header
2. storage.get("auth:tokens:token_abc123")
3. Verify not expired
Cedar Policy Engine:
context = {
principal: User(token_id, policies=[...]),
action: "read",
resource: "/secret/data/myapp",
ip: "10.20.5.1"
}
→ Evaluate policies → Decision: permit
KV Engine read handler:
1. Parse path: myapp
2. storage.get("secret:v1:myapp")
3. Returns encrypted ciphertext
Crypto Backend decrypt:
master_key = seal_manager.unseal()
plaintext = aes_256_gcm.decrypt(ciphertext, master_key)
→ {"username": "admin", "password": "secret123"}
Response:
{
"request_id": "req_123",
"data": {
"data": {"username": "admin", "password": "secret123"},
"metadata": {
"created_time": "2025-12-21T10:20:00Z",
"current_version": 1
}
}
}
Metrics & Audit:
vault_secrets_read.inc()
audit_log(success)
```
---
## Performance Characteristics
### Async/Await Foundation
All I/O operations use Tokio's non-blocking runtime:
- HTTP requests: Axum + Hyper (async)
- Database queries: sqlx (async driver)
- etcd operations: etcd_client (async)
- File operations: tokio::fs (async)
Result: **Thousands of concurrent requests** on single machine
### Caching Strategy
Limited in-memory caching for:
- Token metadata (refreshed on access)
- Policy evaluation (for frequently used policies)
- Crypto key material (loaded once, kept in memory)
### Lock Contention
Minimal contention design:
- Per-token locking only during TTL updates
- Storage backend handles internal consistency
- No distributed locks (CAS operations used instead)
---
## Security Architecture
### Secret Encryption
All secrets encrypted at rest:
```
Plaintext → Master Key → AES-256-GCM → Ciphertext
(with AAD)
```
Master key stored encrypted via Shamir SSS (threshold encryption).
### Audit Trail
Complete operation audit:
```
Every operation logged:
- Principal (token ID)
- Action (read/write/delete)
- Resource (secret path)
- Result (success/failure)
- Timestamp
- IP address
- Error details
```
### Policy Enforcement
Cedar policies enforce:
- **Who** can access (principal matching)
- **What** they can do (action authorization)
- **Where** they access (resource paths)
- **When** they access (time windows)
- **How** they access (IP ranges, MFA)
---
## Extension Points
### Adding New Storage Backend
1. Implement `StorageBackend` trait
2. Add to `StorageRegistry::create()`
3. Add feature flag in Cargo.toml
4. Update configuration schema
Example: To add S3 backend, implement trait with get/set/delete/list methods, add to registry match statement, add feature flag, update config TOML schema.
### Adding New Secrets Engine
1. Implement `Engine` trait
2. Add to `EngineRegistry::mount_engines()`
3. Implement Axum routes
4. Add to configuration
Example: To add SSH engine, create new file, implement Engine trait with handle_request, add Axum router methods, integrate into registry.
---
**Architecture validated**: Config-driven design enables flexible deployment while maintaining type safety and performance.