# Security Considerations for Control Center Enhancements ## Overview This document outlines the security architecture and considerations for the control-center enhancements, including KMS SSH key management, mode-based RBAC, and platform service monitoring. ## 1. SSH Key Management Security ### 1.1 Key Storage Security **Implementation**: - Private keys encrypted at rest using AES-256-GCM in KMS - Public keys stored in plaintext (as they are meant to be public) - Private key material never exposed in API responses - Key IDs used as references, not actual keys **Threat Mitigation**: - ✅ **Data at Rest**: All private keys encrypted with master encryption key - ✅ **Key Exposure**: Private keys only decrypted in memory when needed - ✅ **Key Leakage**: Zeroization of key material after use - ✅ **Unauthorized Access**: KMS access controlled by RBAC **Best Practices**: ```rust // Good: Using key ID reference let key_id = ssh_key_manager.store_ssh_key(name, private, public, purpose, tags).await?; // Bad: Never do this - exposing private key in logs tracing::info!("Stored key: {}", private_key); // DON'T DO THIS ``` ### 1.2 Key Rotation Security **Implementation**: - Configurable rotation intervals (default 90 days) - Grace period for old key usage (default 7 days) - Automatic rotation scheduling (if enabled) - Manual rotation support with immediate effect **Threat Mitigation**: - ✅ **Key Compromise**: Regular rotation limits exposure window - ✅ **Stale Keys**: Automated detection of keys due for rotation - ✅ **Rotation Failures**: Graceful degradation with error logging **Rotation Policy**: ```toml [kms.ssh_keys] rotation_enabled = true rotation_interval_days = 90 # Enterprise: 30, Dev: 180 grace_period_days = 7 # Time to update deployed keys auto_rotate = false # Manual approval recommended ``` ### 1.3 Audit Logging **Logged Events**: - SSH key creation (who, when, purpose) - SSH key retrieval (who accessed, when) - SSH key rotation (old key ID, new key ID) - SSH key deletion (who deleted, when) - Failed access attempts **Audit Entry Structure**: ```rust pub struct SshKeyAuditEntry { pub timestamp: DateTime, pub key_id: String, pub action: SshKeyAction, pub user: Option, // User who performed action pub ip_address: Option, // Source IP pub success: bool, pub error_message: Option, } ``` **Threat Mitigation**: - ✅ **Unauthorized Access**: Full audit trail for forensics - ✅ **Insider Threats**: User attribution for all actions - ✅ **Compliance**: GDPR/SOC2 audit log requirements met **Audit Log Retention**: - In-memory: Last 10,000 entries - Persistent: SurrealDB with 1-year retention - Compliance mode: 7-year retention (configurable) ### 1.4 Key Fingerprinting **Implementation**: ```rust fn calculate_fingerprint(public_key: &[u8]) -> Result { use sha2::{Sha256, Digest}; let mut hasher = Sha256::new(); hasher.update(public_key); let result = hasher.finalize(); Ok(format!("SHA256:{}", base64::encode(&result[..16]))) } ``` **Security Benefits**: - Verify key integrity - Detect key tampering - Match deployed keys to KMS records ## 2. RBAC Security ### 2.1 Execution Modes **Security Model by Mode**: | Mode | Security Level | Use Case | Audit Required | |------|---------------|----------|----------------| | Solo | Low | Single developer | No | | MultiUser | Medium | Small teams | Optional | | CICD | Medium | Automation | Yes | | Enterprise | High | Production | Mandatory | **Mode-Specific Security**: #### Solo Mode ```rust // Solo mode: All users are admin // Security: Trust-based, no RBAC checks if mode == ExecutionMode::Solo { return true; // Allow all operations } ``` **Risks**: - No access control - No audit trail - Single point of failure **Mitigations**: - Only for development environments - Network isolation required - Regular backups #### MultiUser Mode ```rust // Multi-user: Role-based access control let permissions = rbac_manager.get_user_permissions(&user).await; if !permissions.contains(&required_permission) { return Err(RbacError::PermissionDenied); } ``` **Security Features**: - Role-based permissions - Optional audit logging - Session management #### CICD Mode ```rust // CICD: Service account focused // All actions logged for automation tracking if mode == ExecutionMode::CICD { audit_log.log_automation_action(service_account, action).await; } ``` **Security Features**: - Service account isolation - Mandatory audit logging - Token-based authentication - Short-lived credentials #### Enterprise Mode ```rust // Enterprise: Full security // - Mandatory audit logging // - Stricter session timeouts // - Compliance reports if mode == ExecutionMode::Enterprise { audit_log.log_with_compliance(user, action, compliance_tags).await; } ``` **Security Features**: - Full RBAC enforcement - Comprehensive audit logging - Compliance reporting - Role assignment approval workflow ### 2.2 Permission System **Permission Levels**: ```rust Role::Admin => 100 // Full access Role::Operator => 80 // Deploy & manage Role::Developer => 60 // Read + dev deploy Role::ServiceAccount => 50 // Automation Role::Auditor => 40 // Read + audit Role::Viewer => 20 // Read-only ``` **Action Security Levels**: ```rust Action::Delete => 100 // Destructive, admin only Action::Manage => 80 // Service management Action::Deploy => 80 // Deploy to production Action::Create => 60 // Create resources Action::Update => 60 // Modify resources Action::Execute => 50 // Execute operations Action::Audit => 40 // View audit logs Action::Read => 20 // View resources ``` **Permission Check**: ```rust pub fn can_perform(&self, required_level: u8) -> bool { self.permission_level() >= required_level } ``` **Security Guarantees**: - ✅ Least privilege by default (Viewer role) - ✅ Hierarchical permissions (higher roles include lower) - ✅ Explicit deny for unknown resources - ✅ No permission escalation without admin ### 2.3 Session Security **Session Configuration**: ```toml [security] session_timeout_minutes = 60 # Solo/MultiUser session_timeout_minutes = 30 # Enterprise max_sessions_per_user = 5 failed_login_lockout_attempts = 5 failed_login_lockout_duration_minutes = 15 ``` **Session Lifecycle**: 1. User authenticates → JWT token issued 2. Token includes: user_id, role, issued_at, expires_at 3. Middleware validates token on each request 4. Session tracked in Redis/RocksDB 5. Session invalidated on logout or timeout **Security Features**: - JWT with RSA-2048 signatures - Refresh token rotation - Session fixation prevention - Concurrent session limits **Threat Mitigation**: - ✅ **Session Hijacking**: Short-lived tokens (1 hour) - ✅ **Token Replay**: One-time refresh tokens - ✅ **Brute Force**: Account lockout after 5 failures - ✅ **Session Fixation**: New session ID on login ### 2.4 Middleware Security **RBAC Middleware Flow**: ``` Request → Auth Middleware → RBAC Middleware → Handler ↓ ↓ Extract User Check Permission from JWT Token (role + resource + action) ↓ Allow / Deny ``` **Middleware Implementation**: ```rust pub async fn check_permission( State(state): State>, resource: Resource, action: Action, mut req: Request, next: Next, ) -> Result { let user = req.extensions() .get::() .ok_or(RbacError::UserNotFound("No user in request".to_string()))?; if !state.rbac_manager.check_permission(&user, resource, action).await { return Err(RbacError::PermissionDenied); } Ok(next.run(req).await) } ``` **Security Guarantees**: - ✅ All API endpoints protected by default - ✅ Permission checked before handler execution - ✅ User context available in handlers - ✅ Failed checks logged for audit ## 3. Platform Monitoring Security ### 3.1 Service Access Security **Internal URLs Only**: ```toml [platform] orchestrator_url = "http://localhost:8080" # Not exposed externally coredns_url = "http://localhost:9153" gitea_url = "http://localhost:3000" oci_registry_url = "http://localhost:5000" ``` **Network Security**: - All services on localhost or internal network - No external exposure of monitoring endpoints - Firewall rules to prevent external access **Threat Mitigation**: - ✅ **External Scanning**: Services not reachable from internet - ✅ **DDoS**: Internal-only access limits attack surface - ✅ **Data Exfiltration**: Monitoring data not exposed externally ### 3.2 Health Check Security **Timeout Protection**: ```rust let client = Client::builder() .timeout(std::time::Duration::from_secs(5)) // Prevent hanging .build() .unwrap(); ``` **Error Handling**: ```rust // Never expose internal errors to users Err(e) => { // Log detailed error internally tracing::error!("Health check failed for {}: {}", service, e); // Return generic error externally ServiceStatus { status: HealthStatus::Unhealthy, error_message: Some("Service unavailable".to_string()), // Generic .. } } ``` **Threat Mitigation**: - ✅ **Timeout Attacks**: 5-second timeout prevents resource exhaustion - ✅ **Information Disclosure**: Error messages sanitized - ✅ **Resource Exhaustion**: Parallel checks with concurrency limits ### 3.3 Service Control Security **RBAC-Protected Service Control**: ```rust // Only Operator or Admin can start/stop services #[axum::debug_handler] pub async fn start_service( State(state): State, Extension(user): Extension, Path(service_type): Path, ) -> Result { // Check permission if !rbac_manager.check_permission( &user, Resource::Service, Action::Manage, ).await { return Err(ApiError::PermissionDenied); } // Start service service_manager.start_service(&service_type).await?; // Audit log audit_log.log_service_action(user, service_type, "start").await; Ok(StatusCode::OK) } ``` **Security Guarantees**: - ✅ Only authorized users can control services - ✅ All service actions logged - ✅ Graceful degradation on service failure ## 4. Threat Model ### 4.1 High-Risk Threats #### Threat: SSH Private Key Exposure **Attack Vector**: Attacker gains access to KMS database **Mitigations**: - Private keys encrypted at rest with master key - Master key stored in hardware security module (HSM) or KMS - Key access audited and rate-limited - Zeroization of decrypted keys in memory **Detection**: - Audit log monitoring for unusual key access patterns - Alerting on bulk key retrievals #### Threat: Privilege Escalation **Attack Vector**: Lower-privileged user attempts to gain admin access **Mitigations**: - Role assignment requires Admin role - Mode switching requires Admin role - Middleware enforces permissions on every request - No client-side permission checks (server-side only) **Detection**: - Failed permission checks logged - Alerting on repeated permission denials #### Threat: Session Hijacking **Attack Vector**: Attacker steals JWT token **Mitigations**: - Short-lived access tokens (1 hour) - Refresh token rotation - Secure HTTP-only cookies (recommended) - IP address binding (optional) **Detection**: - Unusual login locations - Concurrent sessions from different IPs ### 4.2 Medium-Risk Threats #### Threat: Service Impersonation **Attack Vector**: Malicious service pretends to be legitimate platform service **Mitigations**: - Service URLs configured in config file (not dynamic) - TLS certificate validation (if HTTPS) - Service authentication tokens **Detection**: - Health check failures - Metrics anomalies #### Threat: Audit Log Tampering **Attack Vector**: Attacker modifies audit logs to hide tracks **Mitigations**: - Audit logs write-only - Logs stored in tamper-evident database (SurrealDB) - Hash chain for log integrity - Offsite log backup **Detection**: - Hash chain verification - Log gap detection ### 4.3 Low-Risk Threats #### Threat: Information Disclosure via Error Messages **Attack Vector**: Error messages leak internal information **Mitigations**: - Generic error messages for users - Detailed errors only in server logs - Error message sanitization **Detection**: - Code review for error handling - Automated scanning for sensitive data in responses ## 5. Compliance Considerations ### 5.1 GDPR Compliance **Personal Data Handling**: - User information: username, email, IP addresses - Retention: Audit logs kept for required period - Right to erasure: User deletion deletes all associated data **Implementation**: ```rust // Delete user and all associated data pub async fn delete_user(&self, user_id: &str) -> Result<(), RbacError> { // Delete user SSH keys for key in self.list_user_ssh_keys(user_id).await? { self.delete_ssh_key(&key.key_id).await?; } // Anonymize audit logs (retain for compliance, remove PII) self.anonymize_user_audit_logs(user_id).await?; // Delete user record self.delete_user_record(user_id).await?; Ok(()) } ``` ### 5.2 SOC 2 Compliance **Security Controls**: - ✅ Access control (RBAC) - ✅ Audit logging (all actions logged) - ✅ Encryption at rest (KMS) - ✅ Encryption in transit (HTTPS recommended) - ✅ Session management (timeout, MFA support) **Monitoring & Alerting**: - ✅ Service health monitoring - ✅ Failed login tracking - ✅ Permission denial alerting - ✅ Unusual activity detection ### 5.3 PCI DSS (if applicable) **Requirements**: - ✅ Encrypt cardholder data (use KMS for keys) - ✅ Maintain access control (RBAC) - ✅ Track and monitor access (audit logs) - ✅ Regularly test security (integration tests) ## 6. Security Best Practices ### 6.1 Development **Code Review Checklist**: - [ ] All API endpoints have RBAC middleware - [ ] No hardcoded secrets or keys - [ ] Error messages don't leak sensitive info - [ ] Audit logging for sensitive operations - [ ] Input validation on all user inputs - [ ] SQL injection prevention (use parameterized queries) - [ ] XSS prevention (escape user inputs) **Testing**: - Unit tests for permission checks - Integration tests for RBAC enforcement - Penetration testing for production deployments ### 6.2 Deployment **Production Checklist**: - [ ] Change default admin password - [ ] Enable HTTPS with valid certificate - [ ] Configure firewall rules (internal services only) - [ ] Set appropriate execution mode (Enterprise for production) - [ ] Enable audit logging - [ ] Configure session timeout (30 minutes for Enterprise) - [ ] Enable rate limiting - [ ] Set up log monitoring and alerting - [ ] Regular security updates - [ ] Backup encryption keys ### 6.3 Operations **Incident Response**: 1. **Detection**: Monitor audit logs for anomalies 2. **Containment**: Revoke compromised credentials 3. **Eradication**: Rotate affected SSH keys 4. **Recovery**: Restore from backup if needed 5. **Lessons Learned**: Update security controls **Key Rotation Schedule**: - SSH keys: Every 90 days (Enterprise: 30 days) - JWT signing keys: Every 180 days - Master encryption key: Every 365 days - Service account tokens: Every 30 days ## 7. Security Metrics ### 7.1 Monitoring Metrics **Authentication**: - Failed login attempts per user - Concurrent sessions per user - Session duration (average, p95, p99) **Authorization**: - Permission denials per user - Permission denials per resource - Role assignments per day **Audit**: - SSH key accesses per day - SSH key rotations per month - Audit log retention compliance **Services**: - Service health check success rate - Service response times (p50, p95, p99) - Service dependency failures ### 7.2 Alerting Thresholds **Critical Alerts**: - Service health: >3 failures in 5 minutes - Failed logins: >10 attempts in 1 minute - Permission denials: >50 in 1 minute - SSH key bulk retrieval: >10 keys in 1 minute **Warning Alerts**: - Service degraded: response time >1 second - Session timeout rate: >10% of sessions - Audit log storage: >80% capacity ## 8. Security Roadmap ### Phase 1 (Completed) - ✅ SSH key storage with encryption - ✅ Mode-based RBAC - ✅ Audit logging - ✅ Platform monitoring ### Phase 2 (In Progress) - 📋 API handlers with RBAC enforcement - 📋 Integration tests for security - 📋 Documentation ### Phase 3 (Future) - Multi-factor authentication (MFA) - Hardware security module (HSM) integration - Advanced threat detection (ML-based) - Automated security scanning - Compliance report generation - Security information and event management (SIEM) integration ## References - **OWASP Top 10**: https://owasp.org/www-project-top-ten/ - **NIST Cybersecurity Framework**: https://www.nist.gov/cyberframework - **CIS Controls**: https://www.cisecurity.org/controls - **GDPR**: https://gdpr.eu/ - **SOC 2**: https://www.aicpa.org/interestareas/frc/assuranceadvisoryservices/socforserviceorganizations.html --- **Last Updated**: 2025-10-06 **Review Cycle**: Quarterly **Next Review**: 2026-01-06