17 KiB
Security Considerations for Control Center Enhancements
Overview
This document outlines the security architecture and considerations for the control-center enhancements, including KMS SSH key management, mode-based RBAC, and platform service monitoring.
1. SSH Key Management Security
1.1 Key Storage Security
Implementation:
- Private keys encrypted at rest using AES-256-GCM in KMS
- Public keys stored in plaintext (as they are meant to be public)
- Private key material never exposed in API responses
- Key IDs used as references, not actual keys
Threat Mitigation:
- ✅ Data at Rest: All private keys encrypted with master encryption key
- ✅ Key Exposure: Private keys only decrypted in memory when needed
- ✅ Key Leakage: Zeroization of key material after use
- ✅ Unauthorized Access: KMS access controlled by RBAC
Best Practices:
// Good: Using key ID reference
let key_id = ssh_key_manager.store_ssh_key(name, private, public, purpose, tags).await?;
// Bad: Never do this - exposing private key in logs
tracing::info!("Stored key: {}", private_key); // DON'T DO THIS
1.2 Key Rotation Security
Implementation:
- Configurable rotation intervals (default 90 days)
- Grace period for old key usage (default 7 days)
- Automatic rotation scheduling (if enabled)
- Manual rotation support with immediate effect
Threat Mitigation:
- ✅ Key Compromise: Regular rotation limits exposure window
- ✅ Stale Keys: Automated detection of keys due for rotation
- ✅ Rotation Failures: Graceful degradation with error logging
Rotation Policy:
[kms.ssh_keys]
rotation_enabled = true
rotation_interval_days = 90 # Enterprise: 30, Dev: 180
grace_period_days = 7 # Time to update deployed keys
auto_rotate = false # Manual approval recommended
1.3 Audit Logging
Logged Events:
- SSH key creation (who, when, purpose)
- SSH key retrieval (who accessed, when)
- SSH key rotation (old key ID, new key ID)
- SSH key deletion (who deleted, when)
- Failed access attempts
Audit Entry Structure:
pub struct SshKeyAuditEntry {
pub timestamp: DateTime<Utc>,
pub key_id: String,
pub action: SshKeyAction,
pub user: Option<String>, // User who performed action
pub ip_address: Option<String>, // Source IP
pub success: bool,
pub error_message: Option<String>,
}
Threat Mitigation:
- ✅ Unauthorized Access: Full audit trail for forensics
- ✅ Insider Threats: User attribution for all actions
- ✅ Compliance: GDPR/SOC2 audit log requirements met
Audit Log Retention:
- In-memory: Last 10,000 entries
- Persistent: SurrealDB with 1-year retention
- Compliance mode: 7-year retention (configurable)
1.4 Key Fingerprinting
Implementation:
fn calculate_fingerprint(public_key: &[u8]) -> Result<String, KmsError> {
use sha2::{Sha256, Digest};
let mut hasher = Sha256::new();
hasher.update(public_key);
let result = hasher.finalize();
Ok(format!("SHA256:{}", base64::encode(&result[..16])))
}
Security Benefits:
- Verify key integrity
- Detect key tampering
- Match deployed keys to KMS records
2. RBAC Security
2.1 Execution Modes
Security Model by Mode:
| Mode | Security Level | Use Case | Audit Required |
|---|---|---|---|
| Solo | Low | Single developer | No |
| MultiUser | Medium | Small teams | Optional |
| CICD | Medium | Automation | Yes |
| Enterprise | High | Production | Mandatory |
Mode-Specific Security:
Solo Mode
// Solo mode: All users are admin
// Security: Trust-based, no RBAC checks
if mode == ExecutionMode::Solo {
return true; // Allow all operations
}
Risks:
- No access control
- No audit trail
- Single point of failure
Mitigations:
- Only for development environments
- Network isolation required
- Regular backups
MultiUser Mode
// Multi-user: Role-based access control
let permissions = rbac_manager.get_user_permissions(&user).await;
if !permissions.contains(&required_permission) {
return Err(RbacError::PermissionDenied);
}
Security Features:
- Role-based permissions
- Optional audit logging
- Session management
CICD Mode
// CICD: Service account focused
// All actions logged for automation tracking
if mode == ExecutionMode::CICD {
audit_log.log_automation_action(service_account, action).await;
}
Security Features:
- Service account isolation
- Mandatory audit logging
- Token-based authentication
- Short-lived credentials
Enterprise Mode
// Enterprise: Full security
// - Mandatory audit logging
// - Stricter session timeouts
// - Compliance reports
if mode == ExecutionMode::Enterprise {
audit_log.log_with_compliance(user, action, compliance_tags).await;
}
Security Features:
- Full RBAC enforcement
- Comprehensive audit logging
- Compliance reporting
- Role assignment approval workflow
2.2 Permission System
Permission Levels:
Role::Admin => 100 // Full access
Role::Operator => 80 // Deploy & manage
Role::Developer => 60 // Read + dev deploy
Role::ServiceAccount => 50 // Automation
Role::Auditor => 40 // Read + audit
Role::Viewer => 20 // Read-only
Action Security Levels:
Action::Delete => 100 // Destructive, admin only
Action::Manage => 80 // Service management
Action::Deploy => 80 // Deploy to production
Action::Create => 60 // Create resources
Action::Update => 60 // Modify resources
Action::Execute => 50 // Execute operations
Action::Audit => 40 // View audit logs
Action::Read => 20 // View resources
Permission Check:
pub fn can_perform(&self, required_level: u8) -> bool {
self.permission_level() >= required_level
}
Security Guarantees:
- ✅ Least privilege by default (Viewer role)
- ✅ Hierarchical permissions (higher roles include lower)
- ✅ Explicit deny for unknown resources
- ✅ No permission escalation without admin
2.3 Session Security
Session Configuration:
[security]
session_timeout_minutes = 60 # Solo/MultiUser
session_timeout_minutes = 30 # Enterprise
max_sessions_per_user = 5
failed_login_lockout_attempts = 5
failed_login_lockout_duration_minutes = 15
Session Lifecycle:
- User authenticates → JWT token issued
- Token includes: user_id, role, issued_at, expires_at
- Middleware validates token on each request
- Session tracked in Redis/RocksDB
- Session invalidated on logout or timeout
Security Features:
- JWT with RSA-2048 signatures
- Refresh token rotation
- Session fixation prevention
- Concurrent session limits
Threat Mitigation:
- ✅ Session Hijacking: Short-lived tokens (1 hour)
- ✅ Token Replay: One-time refresh tokens
- ✅ Brute Force: Account lockout after 5 failures
- ✅ Session Fixation: New session ID on login
2.4 Middleware Security
RBAC Middleware Flow:
Request → Auth Middleware → RBAC Middleware → Handler
↓ ↓
Extract User Check Permission
from JWT Token (role + resource + action)
↓
Allow / Deny
Middleware Implementation:
pub async fn check_permission(
State(state): State<Arc<RbacMiddleware>>,
resource: Resource,
action: Action,
mut req: Request,
next: Next,
) -> Result<Response, RbacError> {
let user = req.extensions()
.get::<User>()
.ok_or(RbacError::UserNotFound("No user in request".to_string()))?;
if !state.rbac_manager.check_permission(&user, resource, action).await {
return Err(RbacError::PermissionDenied);
}
Ok(next.run(req).await)
}
Security Guarantees:
- ✅ All API endpoints protected by default
- ✅ Permission checked before handler execution
- ✅ User context available in handlers
- ✅ Failed checks logged for audit
3. Platform Monitoring Security
3.1 Service Access Security
Internal URLs Only:
[platform]
orchestrator_url = "http://localhost:8080" # Not exposed externally
coredns_url = "http://localhost:9153"
gitea_url = "http://localhost:3000"
oci_registry_url = "http://localhost:5000"
Network Security:
- All services on localhost or internal network
- No external exposure of monitoring endpoints
- Firewall rules to prevent external access
Threat Mitigation:
- ✅ External Scanning: Services not reachable from internet
- ✅ DDoS: Internal-only access limits attack surface
- ✅ Data Exfiltration: Monitoring data not exposed externally
3.2 Health Check Security
Timeout Protection:
let client = Client::builder()
.timeout(std::time::Duration::from_secs(5)) // Prevent hanging
.build()
.unwrap();
Error Handling:
// Never expose internal errors to users
Err(e) => {
// Log detailed error internally
tracing::error!("Health check failed for {}: {}", service, e);
// Return generic error externally
ServiceStatus {
status: HealthStatus::Unhealthy,
error_message: Some("Service unavailable".to_string()), // Generic
..
}
}
Threat Mitigation:
- ✅ Timeout Attacks: 5-second timeout prevents resource exhaustion
- ✅ Information Disclosure: Error messages sanitized
- ✅ Resource Exhaustion: Parallel checks with concurrency limits
3.3 Service Control Security
RBAC-Protected Service Control:
// Only Operator or Admin can start/stop services
#[axum::debug_handler]
pub async fn start_service(
State(state): State<AppState>,
Extension(user): Extension<User>,
Path(service_type): Path<String>,
) -> Result<StatusCode, ApiError> {
// Check permission
if !rbac_manager.check_permission(
&user,
Resource::Service,
Action::Manage,
).await {
return Err(ApiError::PermissionDenied);
}
// Start service
service_manager.start_service(&service_type).await?;
// Audit log
audit_log.log_service_action(user, service_type, "start").await;
Ok(StatusCode::OK)
}
Security Guarantees:
- ✅ Only authorized users can control services
- ✅ All service actions logged
- ✅ Graceful degradation on service failure
4. Threat Model
4.1 High-Risk Threats
Threat: SSH Private Key Exposure
Attack Vector: Attacker gains access to KMS database
Mitigations:
- Private keys encrypted at rest with master key
- Master key stored in hardware security module (HSM) or KMS
- Key access audited and rate-limited
- Zeroization of decrypted keys in memory
Detection:
- Audit log monitoring for unusual key access patterns
- Alerting on bulk key retrievals
Threat: Privilege Escalation
Attack Vector: Lower-privileged user attempts to gain admin access
Mitigations:
- Role assignment requires Admin role
- Mode switching requires Admin role
- Middleware enforces permissions on every request
- No client-side permission checks (server-side only)
Detection:
- Failed permission checks logged
- Alerting on repeated permission denials
Threat: Session Hijacking
Attack Vector: Attacker steals JWT token
Mitigations:
- Short-lived access tokens (1 hour)
- Refresh token rotation
- Secure HTTP-only cookies (recommended)
- IP address binding (optional)
Detection:
- Unusual login locations
- Concurrent sessions from different IPs
4.2 Medium-Risk Threats
Threat: Service Impersonation
Attack Vector: Malicious service pretends to be legitimate platform service
Mitigations:
- Service URLs configured in config file (not dynamic)
- TLS certificate validation (if HTTPS)
- Service authentication tokens
Detection:
- Health check failures
- Metrics anomalies
Threat: Audit Log Tampering
Attack Vector: Attacker modifies audit logs to hide tracks
Mitigations:
- Audit logs write-only
- Logs stored in tamper-evident database (SurrealDB)
- Hash chain for log integrity
- Offsite log backup
Detection:
- Hash chain verification
- Log gap detection
4.3 Low-Risk Threats
Threat: Information Disclosure via Error Messages
Attack Vector: Error messages leak internal information
Mitigations:
- Generic error messages for users
- Detailed errors only in server logs
- Error message sanitization
Detection:
- Code review for error handling
- Automated scanning for sensitive data in responses
5. Compliance Considerations
5.1 GDPR Compliance
Personal Data Handling:
- User information: username, email, IP addresses
- Retention: Audit logs kept for required period
- Right to erasure: User deletion deletes all associated data
Implementation:
// Delete user and all associated data
pub async fn delete_user(&self, user_id: &str) -> Result<(), RbacError> {
// Delete user SSH keys
for key in self.list_user_ssh_keys(user_id).await? {
self.delete_ssh_key(&key.key_id).await?;
}
// Anonymize audit logs (retain for compliance, remove PII)
self.anonymize_user_audit_logs(user_id).await?;
// Delete user record
self.delete_user_record(user_id).await?;
Ok(())
}
5.2 SOC 2 Compliance
Security Controls:
- ✅ Access control (RBAC)
- ✅ Audit logging (all actions logged)
- ✅ Encryption at rest (KMS)
- ✅ Encryption in transit (HTTPS recommended)
- ✅ Session management (timeout, MFA support)
Monitoring & Alerting:
- ✅ Service health monitoring
- ✅ Failed login tracking
- ✅ Permission denial alerting
- ✅ Unusual activity detection
5.3 PCI DSS (if applicable)
Requirements:
- ✅ Encrypt cardholder data (use KMS for keys)
- ✅ Maintain access control (RBAC)
- ✅ Track and monitor access (audit logs)
- ✅ Regularly test security (integration tests)
6. Security Best Practices
6.1 Development
Code Review Checklist:
- All API endpoints have RBAC middleware
- No hardcoded secrets or keys
- Error messages don't leak sensitive info
- Audit logging for sensitive operations
- Input validation on all user inputs
- SQL injection prevention (use parameterized queries)
- XSS prevention (escape user inputs)
Testing:
- Unit tests for permission checks
- Integration tests for RBAC enforcement
- Penetration testing for production deployments
6.2 Deployment
Production Checklist:
- Change default admin password
- Enable HTTPS with valid certificate
- Configure firewall rules (internal services only)
- Set appropriate execution mode (Enterprise for production)
- Enable audit logging
- Configure session timeout (30 minutes for Enterprise)
- Enable rate limiting
- Set up log monitoring and alerting
- Regular security updates
- Backup encryption keys
6.3 Operations
Incident Response:
- Detection: Monitor audit logs for anomalies
- Containment: Revoke compromised credentials
- Eradication: Rotate affected SSH keys
- Recovery: Restore from backup if needed
- Lessons Learned: Update security controls
Key Rotation Schedule:
- SSH keys: Every 90 days (Enterprise: 30 days)
- JWT signing keys: Every 180 days
- Master encryption key: Every 365 days
- Service account tokens: Every 30 days
7. Security Metrics
7.1 Monitoring Metrics
Authentication:
- Failed login attempts per user
- Concurrent sessions per user
- Session duration (average, p95, p99)
Authorization:
- Permission denials per user
- Permission denials per resource
- Role assignments per day
Audit:
- SSH key accesses per day
- SSH key rotations per month
- Audit log retention compliance
Services:
- Service health check success rate
- Service response times (p50, p95, p99)
- Service dependency failures
7.2 Alerting Thresholds
Critical Alerts:
- Service health: >3 failures in 5 minutes
- Failed logins: >10 attempts in 1 minute
- Permission denials: >50 in 1 minute
- SSH key bulk retrieval: >10 keys in 1 minute
Warning Alerts:
- Service degraded: response time >1 second
- Session timeout rate: >10% of sessions
- Audit log storage: >80% capacity
8. Security Roadmap
Phase 1 (Completed)
- ✅ SSH key storage with encryption
- ✅ Mode-based RBAC
- ✅ Audit logging
- ✅ Platform monitoring
Phase 2 (In Progress)
- 📋 API handlers with RBAC enforcement
- 📋 Integration tests for security
- 📋 Documentation
Phase 3 (Future)
- Multi-factor authentication (MFA)
- Hardware security module (HSM) integration
- Advanced threat detection (ML-based)
- Automated security scanning
- Compliance report generation
- Security information and event management (SIEM) integration
References
- OWASP Top 10: https://owasp.org/www-project-top-ten/
- NIST Cybersecurity Framework: https://www.nist.gov/cyberframework
- CIS Controls: https://www.cisecurity.org/controls
- GDPR: https://gdpr.eu/
- SOC 2: https://www.aicpa.org/interestareas/frc/assuranceadvisoryservices/socforserviceorganizations.html
Last Updated: 2025-10-06 Review Cycle: Quarterly Next Review: 2026-01-06