1 line
18 KiB
Markdown
1 line
18 KiB
Markdown
# Security Considerations for Control Center Enhancements\n\n## Overview\n\nThis document outlines the security architecture and considerations for the control-center enhancements,\nincluding KMS SSH key management, mode-based RBAC, and platform service monitoring.\n\n## 1. SSH Key Management Security\n\n### 1.1 Key Storage Security\n\n**Implementation**:\n\n- Private keys encrypted at rest using AES-256-GCM in KMS\n- Public keys stored in plaintext (as they are meant to be public)\n- Private key material never exposed in API responses\n- Key IDs used as references, not actual keys\n\n**Threat Mitigation**:\n\n- ✅ **Data at Rest**: All private keys encrypted with master encryption key\n- ✅ **Key Exposure**: Private keys only decrypted in memory when needed\n- ✅ **Key Leakage**: Zeroization of key material after use\n- ✅ **Unauthorized Access**: KMS access controlled by RBAC\n\n**Best Practices**:\n\n```{$detected_lang}\n// Good: Using key ID reference\nlet key_id = ssh_key_manager.store_ssh_key(name, private, public, purpose, tags).await?;\n\n// Bad: Never do this - exposing private key in logs\ntracing::info!("Stored key: {}", private_key); // DON'T DO THIS\n```\n\n### 1.2 Key Rotation Security\n\n**Implementation**:\n\n- Configurable rotation intervals (default 90 days)\n- Grace period for old key usage (default 7 days)\n- Automatic rotation scheduling (if enabled)\n- Manual rotation support with immediate effect\n\n**Threat Mitigation**:\n\n- ✅ **Key Compromise**: Regular rotation limits exposure window\n- ✅ **Stale Keys**: Automated detection of keys due for rotation\n- ✅ **Rotation Failures**: Graceful degradation with error logging\n\n**Rotation Policy**:\n\n```{$detected_lang}\n[kms.ssh_keys]\nrotation_enabled = true\nrotation_interval_days = 90 # Enterprise: 30, Dev: 180\ngrace_period_days = 7 # Time to update deployed keys\nauto_rotate = false # Manual approval recommended\n```\n\n### 1.3 Audit Logging\n\n**Logged Events**:\n\n- SSH key creation (who, when, purpose)\n- SSH key retrieval (who accessed, when)\n- SSH key rotation (old key ID, new key ID)\n- SSH key deletion (who deleted, when)\n- Failed access attempts\n\n**Audit Entry Structure**:\n\n```{$detected_lang}\npub struct SshKeyAuditEntry {\n pub timestamp: DateTime<Utc>,\n pub key_id: String,\n pub action: SshKeyAction,\n pub user: Option<String>, // User who performed action\n pub ip_address: Option<String>, // Source IP\n pub success: bool,\n pub error_message: Option<String>,\n}\n```\n\n**Threat Mitigation**:\n\n- ✅ **Unauthorized Access**: Full audit trail for forensics\n- ✅ **Insider Threats**: User attribution for all actions\n- ✅ **Compliance**: GDPR/SOC2 audit log requirements met\n\n**Audit Log Retention**:\n\n- In-memory: Last 10,000 entries\n- Persistent: SurrealDB with 1-year retention\n- Compliance mode: 7-year retention (configurable)\n\n### 1.4 Key Fingerprinting\n\n**Implementation**:\n\n```{$detected_lang}\nfn calculate_fingerprint(public_key: &[u8]) -> Result<String, KmsError> {\n use sha2::{Sha256, Digest};\n let mut hasher = Sha256::new();\n hasher.update(public_key);\n let result = hasher.finalize();\n Ok(format!("SHA256:{}", base64::encode(&result[..16])))\n}\n```\n\n**Security Benefits**:\n\n- Verify key integrity\n- Detect key tampering\n- Match deployed keys to KMS records\n\n## 2. RBAC Security\n\n### 2.1 Execution Modes\n\n**Security Model by Mode**:\n\n| Mode | Security Level | Use Case | Audit Required |\n| ------ | --------------- | ---------- | ---------------- |\n| Solo | Low | Single developer | No |\n| MultiUser | Medium | Small teams | Optional |\n| CICD | Medium | Automation | Yes |\n| Enterprise | High | Production | Mandatory |\n\n**Mode-Specific Security**:\n\n#### Solo Mode\n\n```{$detected_lang}\n// Solo mode: All users are admin\n// Security: Trust-based, no RBAC checks\nif mode == ExecutionMode::Solo {\n return true; // Allow all operations\n}\n```\n\n**Risks**:\n\n- No access control\n- No audit trail\n- Single point of failure\n\n**Mitigations**:\n\n- Only for development environments\n- Network isolation required\n- Regular backups\n\n#### MultiUser Mode\n\n```{$detected_lang}\n// Multi-user: Role-based access control\nlet permissions = rbac_manager.get_user_permissions(&user).await;\nif !permissions.contains(&required_permission) {\n return Err(RbacError::PermissionDenied);\n}\n```\n\n**Security Features**:\n\n- Role-based permissions\n- Optional audit logging\n- Session management\n\n#### CICD Mode\n\n```{$detected_lang}\n// CICD: Service account focused\n// All actions logged for automation tracking\nif mode == ExecutionMode::CICD {\n audit_log.log_automation_action(service_account, action).await;\n}\n```\n\n**Security Features**:\n\n- Service account isolation\n- Mandatory audit logging\n- Token-based authentication\n- Short-lived credentials\n\n#### Enterprise Mode\n\n```{$detected_lang}\n// Enterprise: Full security\n// - Mandatory audit logging\n// - Stricter session timeouts\n// - Compliance reports\nif mode == ExecutionMode::Enterprise {\n audit_log.log_with_compliance(user, action, compliance_tags).await;\n}\n```\n\n**Security Features**:\n\n- Full RBAC enforcement\n- Comprehensive audit logging\n- Compliance reporting\n- Role assignment approval workflow\n\n### 2.2 Permission System\n\n**Permission Levels**:\n\n```{$detected_lang}\nRole::Admin => 100 // Full access\nRole::Operator => 80 // Deploy & manage\nRole::Developer => 60 // Read + dev deploy\nRole::ServiceAccount => 50 // Automation\nRole::Auditor => 40 // Read + audit\nRole::Viewer => 20 // Read-only\n```\n\n**Action Security Levels**:\n\n```{$detected_lang}\nAction::Delete => 100 // Destructive, admin only\nAction::Manage => 80 // Service management\nAction::Deploy => 80 // Deploy to production\nAction::Create => 60 // Create resources\nAction::Update => 60 // Modify resources\nAction::Execute => 50 // Execute operations\nAction::Audit => 40 // View audit logs\nAction::Read => 20 // View resources\n```\n\n**Permission Check**:\n\n```{$detected_lang}\npub fn can_perform(&self, required_level: u8) -> bool {\n self.permission_level() >= required_level\n}\n```\n\n**Security Guarantees**:\n\n- ✅ Least privilege by default (Viewer role)\n- ✅ Hierarchical permissions (higher roles include lower)\n- ✅ Explicit deny for unknown resources\n- ✅ No permission escalation without admin\n\n### 2.3 Session Security\n\n**Session Configuration**:\n\n```{$detected_lang}\n[security]\nsession_timeout_minutes = 60 # Solo/MultiUser\nsession_timeout_minutes = 30 # Enterprise\nmax_sessions_per_user = 5\nfailed_login_lockout_attempts = 5\nfailed_login_lockout_duration_minutes = 15\n```\n\n**Session Lifecycle**:\n\n1. User authenticates → JWT token issued\n2. Token includes: user_id, role, issued_at, expires_at\n3. Middleware validates token on each request\n4. Session tracked in Redis/RocksDB\n5. Session invalidated on logout or timeout\n\n**Security Features**:\n\n- JWT with RSA-2048 signatures\n- Refresh token rotation\n- Session fixation prevention\n- Concurrent session limits\n\n**Threat Mitigation**:\n\n- ✅ **Session Hijacking**: Short-lived tokens (1 hour)\n- ✅ **Token Replay**: One-time refresh tokens\n- ✅ **Brute Force**: Account lockout after 5 failures\n- ✅ **Session Fixation**: New session ID on login\n\n### 2.4 Middleware Security\n\n**RBAC Middleware Flow**:\n\n```{$detected_lang}\nRequest → Auth Middleware → RBAC Middleware → Handler\n ↓ ↓\n Extract User Check Permission\n from JWT Token (role + resource + action)\n ↓\n Allow / Deny\n```\n\n**Middleware Implementation**:\n\n```{$detected_lang}\npub async fn check_permission(\n State(state): State<Arc<RbacMiddleware>>,\n resource: Resource,\n action: Action,\n mut req: Request,\n next: Next,\n) -> Result<Response, RbacError> {\n let user = req.extensions()\n .get::<User>()\n .ok_or(RbacError::UserNotFound("No user in request".to_string()))?;\n\n if !state.rbac_manager.check_permission(&user, resource, action).await {\n return Err(RbacError::PermissionDenied);\n }\n\n Ok(next.run(req).await)\n}\n```\n\n**Security Guarantees**:\n\n- ✅ All API endpoints protected by default\n- ✅ Permission checked before handler execution\n- ✅ User context available in handlers\n- ✅ Failed checks logged for audit\n\n## 3. Platform Monitoring Security\n\n### 3.1 Service Access Security\n\n**Internal URLs Only**:\n\n```{$detected_lang}\n[platform]\norchestrator_url = "http://localhost:9090" # Not exposed externally\ncoredns_url = "http://localhost:9153"\ngitea_url = "http://localhost:3000"\noci_registry_url = "http://localhost:5000"\n```\n\n**Network Security**:\n\n- All services on localhost or internal network\n- No external exposure of monitoring endpoints\n- Firewall rules to prevent external access\n\n**Threat Mitigation**:\n\n- ✅ **External Scanning**: Services not reachable from internet\n- ✅ **DDoS**: Internal-only access limits attack surface\n- ✅ **Data Exfiltration**: Monitoring data not exposed externally\n\n### 3.2 Health Check Security\n\n**Timeout Protection**:\n\n```{$detected_lang}\nlet client = Client::builder()\n .timeout(std::time::Duration::from_secs(5)) // Prevent hanging\n .build()\n .unwrap();\n```\n\n**Error Handling**:\n\n```{$detected_lang}\n// Never expose internal errors to users\nErr(e) => {\n // Log detailed error internally\n tracing::error!("Health check failed for {}: {}", service, e);\n\n // Return generic error externally\n ServiceStatus {\n status: HealthStatus::Unhealthy,\n error_message: Some("Service unavailable".to_string()), // Generic\n ..\n }\n}\n```\n\n**Threat Mitigation**:\n\n- ✅ **Timeout Attacks**: 5-second timeout prevents resource exhaustion\n- ✅ **Information Disclosure**: Error messages sanitized\n- ✅ **Resource Exhaustion**: Parallel checks with concurrency limits\n\n### 3.3 Service Control Security\n\n**RBAC-Protected Service Control**:\n\n```{$detected_lang}\n// Only Operator or Admin can start/stop services\n#[axum::debug_handler]\npub async fn start_service(\n State(state): State<AppState>,\n Extension(user): Extension<User>,\n Path(service_type): Path<String>,\n) -> Result<StatusCode, ApiError> {\n // Check permission\n if !rbac_manager.check_permission(\n &user,\n Resource::Service,\n Action::Manage,\n ).await {\n return Err(ApiError::PermissionDenied);\n }\n\n // Start service\n service_manager.start_service(&service_type).await?;\n\n // Audit log\n audit_log.log_service_action(user, service_type, "start").await;\n\n Ok(StatusCode::OK)\n}\n```\n\n**Security Guarantees**:\n\n- ✅ Only authorized users can control services\n- ✅ All service actions logged\n- ✅ Graceful degradation on service failure\n\n## 4. Threat Model\n\n### 4.1 High-Risk Threats\n\n#### Threat: SSH Private Key Exposure\n\n**Attack Vector**: Attacker gains access to KMS database\n\n**Mitigations**:\n\n- Private keys encrypted at rest with master key\n- Master key stored in hardware security module (HSM) or KMS\n- Key access audited and rate-limited\n- Zeroization of decrypted keys in memory\n\n**Detection**:\n\n- Audit log monitoring for unusual key access patterns\n- Alerting on bulk key retrievals\n\n#### Threat: Privilege Escalation\n\n**Attack Vector**: Lower-privileged user attempts to gain admin access\n\n**Mitigations**:\n\n- Role assignment requires Admin role\n- Mode switching requires Admin role\n- Middleware enforces permissions on every request\n- No client-side permission checks (server-side only)\n\n**Detection**:\n\n- Failed permission checks logged\n- Alerting on repeated permission denials\n\n#### Threat: Session Hijacking\n\n**Attack Vector**: Attacker steals JWT token\n\n**Mitigations**:\n\n- Short-lived access tokens (1 hour)\n- Refresh token rotation\n- Secure HTTP-only cookies (recommended)\n- IP address binding (optional)\n\n**Detection**:\n\n- Unusual login locations\n- Concurrent sessions from different IPs\n\n### 4.2 Medium-Risk Threats\n\n#### Threat: Service Impersonation\n\n**Attack Vector**: Malicious service pretends to be legitimate platform service\n\n**Mitigations**:\n\n- Service URLs configured in config file (not dynamic)\n- TLS certificate validation (if HTTPS)\n- Service authentication tokens\n\n**Detection**:\n\n- Health check failures\n- Metrics anomalies\n\n#### Threat: Audit Log Tampering\n\n**Attack Vector**: Attacker modifies audit logs to hide tracks\n\n**Mitigations**:\n\n- Audit logs write-only\n- Logs stored in tamper-evident database (SurrealDB)\n- Hash chain for log integrity\n- Offsite log backup\n\n**Detection**:\n\n- Hash chain verification\n- Log gap detection\n\n### 4.3 Low-Risk Threats\n\n#### Threat: Information Disclosure via Error Messages\n\n**Attack Vector**: Error messages leak internal information\n\n**Mitigations**:\n\n- Generic error messages for users\n- Detailed errors only in server logs\n- Error message sanitization\n\n**Detection**:\n\n- Code review for error handling\n- Automated scanning for sensitive data in responses\n\n## 5. Compliance Considerations\n\n### 5.1 GDPR Compliance\n\n**Personal Data Handling**:\n\n- User information: username, email, IP addresses\n- Retention: Audit logs kept for required period\n- Right to erasure: User deletion deletes all associated data\n\n**Implementation**:\n\n```{$detected_lang}\n// Delete user and all associated data\npub async fn delete_user(&self, user_id: &str) -> Result<(), RbacError> {\n // Delete user SSH keys\n for key in self.list_user_ssh_keys(user_id).await? {\n self.delete_ssh_key(&key.key_id).await?;\n }\n\n // Anonymize audit logs (retain for compliance, remove PII)\n self.anonymize_user_audit_logs(user_id).await?;\n\n // Delete user record\n self.delete_user_record(user_id).await?;\n\n Ok(())\n}\n```\n\n### 5.2 SOC 2 Compliance\n\n**Security Controls**:\n\n- ✅ Access control (RBAC)\n- ✅ Audit logging (all actions logged)\n- ✅ Encryption at rest (KMS)\n- ✅ Encryption in transit (HTTPS recommended)\n- ✅ Session management (timeout, MFA support)\n\n**Monitoring & Alerting**:\n\n- ✅ Service health monitoring\n- ✅ Failed login tracking\n- ✅ Permission denial alerting\n- ✅ Unusual activity detection\n\n### 5.3 PCI DSS (if applicable)\n\n**Requirements**:\n\n- ✅ Encrypt cardholder data (use KMS for keys)\n- ✅ Maintain access control (RBAC)\n- ✅ Track and monitor access (audit logs)\n- ✅ Regularly test security (integration tests)\n\n## 6. Security Best Practices\n\n### 6.1 Development\n\n**Code Review Checklist**:\n\n- [ ] All API endpoints have RBAC middleware\n- [ ] No hardcoded secrets or keys\n- [ ] Error messages don't leak sensitive info\n- [ ] Audit logging for sensitive operations\n- [ ] Input validation on all user inputs\n- [ ] SQL injection prevention (use parameterized queries)\n- [ ] XSS prevention (escape user inputs)\n\n**Testing**:\n\n- Unit tests for permission checks\n- Integration tests for RBAC enforcement\n- Penetration testing for production deployments\n\n### 6.2 Deployment\n\n**Production Checklist**:\n\n- [ ] Change default admin password\n- [ ] Enable HTTPS with valid certificate\n- [ ] Configure firewall rules (internal services only)\n- [ ] Set appropriate execution mode (Enterprise for production)\n- [ ] Enable audit logging\n- [ ] Configure session timeout (30 minutes for Enterprise)\n- [ ] Enable rate limiting\n- [ ] Set up log monitoring and alerting\n- [ ] Regular security updates\n- [ ] Backup encryption keys\n\n### 6.3 Operations\n\n**Incident Response**:\n\n1. **Detection**: Monitor audit logs for anomalies\n2. **Containment**: Revoke compromised credentials\n3. **Eradication**: Rotate affected SSH keys\n4. **Recovery**: Restore from backup if needed\n5. **Lessons Learned**: Update security controls\n\n**Key Rotation Schedule**:\n\n- SSH keys: Every 90 days (Enterprise: 30 days)\n- JWT signing keys: Every 180 days\n- Master encryption key: Every 365 days\n- Service account tokens: Every 30 days\n\n## 7. Security Metrics\n\n### 7.1 Monitoring Metrics\n\n**Authentication**:\n\n- Failed login attempts per user\n- Concurrent sessions per user\n- Session duration (average, p95, p99)\n\n**Authorization**:\n\n- Permission denials per user\n- Permission denials per resource\n- Role assignments per day\n\n**Audit**:\n\n- SSH key accesses per day\n- SSH key rotations per month\n- Audit log retention compliance\n\n**Services**:\n\n- Service health check success rate\n- Service response times (p50, p95, p99)\n- Service dependency failures\n\n### 7.2 Alerting Thresholds\n\n**Critical Alerts**:\n\n- Service health: >3 failures in 5 minutes\n- Failed logins: >10 attempts in 1 minute\n- Permission denials: >50 in 1 minute\n- SSH key bulk retrieval: >10 keys in 1 minute\n\n**Warning Alerts**:\n\n- Service degraded: response time >1 second\n- Session timeout rate: >10% of sessions\n- Audit log storage: >80% capacity\n\n## 8. Security Roadmap\n\n### Phase 1 (Completed)\n\n- ✅ SSH key storage with encryption\n- ✅ Mode-based RBAC\n- ✅ Audit logging\n- ✅ Platform monitoring\n\n### Phase 2 (In Progress)\n\n- 📋 API handlers with RBAC enforcement\n- 📋 Integration tests for security\n- 📋 Documentation\n\n### Phase 3 (Future)\n\n- Multi-factor authentication (MFA)\n- Hardware security module (HSM) integration\n- Advanced threat detection (ML-based)\n- Automated security scanning\n- Compliance report generation\n- Security information and event management (SIEM) integration\n\n## References\n\n- **OWASP Top 10**: <https://owasp.org/www-project-top-ten/>\n- **NIST Cybersecurity Framework**: <https://www.nist.gov/cyberframework>\n- **CIS Controls**: <https://www.cisecurity.org/controls>\n- **GDPR**: <https://gdpr.eu/>\n- **SOC 2**: <https://www.aicpa.org/interestareas/frc/assuranceadvisoryservices/socforserviceorganizations.html>\n\n---\n\n**Last Updated**: 2025-10-06\n**Review Cycle**: Quarterly\n**Next Review**: 2026-01-06 |