633 lines
17 KiB
Markdown
633 lines
17 KiB
Markdown
|
|
# Security Considerations for Control Center Enhancements
|
||
|
|
|
||
|
|
## Overview
|
||
|
|
|
||
|
|
This document outlines the security architecture and considerations for the control-center enhancements, including KMS SSH key management, mode-based RBAC, and platform service monitoring.
|
||
|
|
|
||
|
|
## 1. SSH Key Management Security
|
||
|
|
|
||
|
|
### 1.1 Key Storage Security
|
||
|
|
|
||
|
|
**Implementation**:
|
||
|
|
- Private keys encrypted at rest using AES-256-GCM in KMS
|
||
|
|
- Public keys stored in plaintext (as they are meant to be public)
|
||
|
|
- Private key material never exposed in API responses
|
||
|
|
- Key IDs used as references, not actual keys
|
||
|
|
|
||
|
|
**Threat Mitigation**:
|
||
|
|
- ✅ **Data at Rest**: All private keys encrypted with master encryption key
|
||
|
|
- ✅ **Key Exposure**: Private keys only decrypted in memory when needed
|
||
|
|
- ✅ **Key Leakage**: Zeroization of key material after use
|
||
|
|
- ✅ **Unauthorized Access**: KMS access controlled by RBAC
|
||
|
|
|
||
|
|
**Best Practices**:
|
||
|
|
```rust
|
||
|
|
// Good: Using key ID reference
|
||
|
|
let key_id = ssh_key_manager.store_ssh_key(name, private, public, purpose, tags).await?;
|
||
|
|
|
||
|
|
// Bad: Never do this - exposing private key in logs
|
||
|
|
tracing::info!("Stored key: {}", private_key); // DON'T DO THIS
|
||
|
|
```
|
||
|
|
|
||
|
|
### 1.2 Key Rotation Security
|
||
|
|
|
||
|
|
**Implementation**:
|
||
|
|
- Configurable rotation intervals (default 90 days)
|
||
|
|
- Grace period for old key usage (default 7 days)
|
||
|
|
- Automatic rotation scheduling (if enabled)
|
||
|
|
- Manual rotation support with immediate effect
|
||
|
|
|
||
|
|
**Threat Mitigation**:
|
||
|
|
- ✅ **Key Compromise**: Regular rotation limits exposure window
|
||
|
|
- ✅ **Stale Keys**: Automated detection of keys due for rotation
|
||
|
|
- ✅ **Rotation Failures**: Graceful degradation with error logging
|
||
|
|
|
||
|
|
**Rotation Policy**:
|
||
|
|
```toml
|
||
|
|
[kms.ssh_keys]
|
||
|
|
rotation_enabled = true
|
||
|
|
rotation_interval_days = 90 # Enterprise: 30, Dev: 180
|
||
|
|
grace_period_days = 7 # Time to update deployed keys
|
||
|
|
auto_rotate = false # Manual approval recommended
|
||
|
|
```
|
||
|
|
|
||
|
|
### 1.3 Audit Logging
|
||
|
|
|
||
|
|
**Logged Events**:
|
||
|
|
- SSH key creation (who, when, purpose)
|
||
|
|
- SSH key retrieval (who accessed, when)
|
||
|
|
- SSH key rotation (old key ID, new key ID)
|
||
|
|
- SSH key deletion (who deleted, when)
|
||
|
|
- Failed access attempts
|
||
|
|
|
||
|
|
**Audit Entry Structure**:
|
||
|
|
```rust
|
||
|
|
pub struct SshKeyAuditEntry {
|
||
|
|
pub timestamp: DateTime<Utc>,
|
||
|
|
pub key_id: String,
|
||
|
|
pub action: SshKeyAction,
|
||
|
|
pub user: Option<String>, // User who performed action
|
||
|
|
pub ip_address: Option<String>, // Source IP
|
||
|
|
pub success: bool,
|
||
|
|
pub error_message: Option<String>,
|
||
|
|
}
|
||
|
|
```
|
||
|
|
|
||
|
|
**Threat Mitigation**:
|
||
|
|
- ✅ **Unauthorized Access**: Full audit trail for forensics
|
||
|
|
- ✅ **Insider Threats**: User attribution for all actions
|
||
|
|
- ✅ **Compliance**: GDPR/SOC2 audit log requirements met
|
||
|
|
|
||
|
|
**Audit Log Retention**:
|
||
|
|
- In-memory: Last 10,000 entries
|
||
|
|
- Persistent: SurrealDB with 1-year retention
|
||
|
|
- Compliance mode: 7-year retention (configurable)
|
||
|
|
|
||
|
|
### 1.4 Key Fingerprinting
|
||
|
|
|
||
|
|
**Implementation**:
|
||
|
|
```rust
|
||
|
|
fn calculate_fingerprint(public_key: &[u8]) -> Result<String, KmsError> {
|
||
|
|
use sha2::{Sha256, Digest};
|
||
|
|
let mut hasher = Sha256::new();
|
||
|
|
hasher.update(public_key);
|
||
|
|
let result = hasher.finalize();
|
||
|
|
Ok(format!("SHA256:{}", base64::encode(&result[..16])))
|
||
|
|
}
|
||
|
|
```
|
||
|
|
|
||
|
|
**Security Benefits**:
|
||
|
|
- Verify key integrity
|
||
|
|
- Detect key tampering
|
||
|
|
- Match deployed keys to KMS records
|
||
|
|
|
||
|
|
## 2. RBAC Security
|
||
|
|
|
||
|
|
### 2.1 Execution Modes
|
||
|
|
|
||
|
|
**Security Model by Mode**:
|
||
|
|
|
||
|
|
| Mode | Security Level | Use Case | Audit Required |
|
||
|
|
|------|---------------|----------|----------------|
|
||
|
|
| Solo | Low | Single developer | No |
|
||
|
|
| MultiUser | Medium | Small teams | Optional |
|
||
|
|
| CICD | Medium | Automation | Yes |
|
||
|
|
| Enterprise | High | Production | Mandatory |
|
||
|
|
|
||
|
|
**Mode-Specific Security**:
|
||
|
|
|
||
|
|
#### Solo Mode
|
||
|
|
```rust
|
||
|
|
// Solo mode: All users are admin
|
||
|
|
// Security: Trust-based, no RBAC checks
|
||
|
|
if mode == ExecutionMode::Solo {
|
||
|
|
return true; // Allow all operations
|
||
|
|
}
|
||
|
|
```
|
||
|
|
|
||
|
|
**Risks**:
|
||
|
|
- No access control
|
||
|
|
- No audit trail
|
||
|
|
- Single point of failure
|
||
|
|
|
||
|
|
**Mitigations**:
|
||
|
|
- Only for development environments
|
||
|
|
- Network isolation required
|
||
|
|
- Regular backups
|
||
|
|
|
||
|
|
#### MultiUser Mode
|
||
|
|
```rust
|
||
|
|
// Multi-user: Role-based access control
|
||
|
|
let permissions = rbac_manager.get_user_permissions(&user).await;
|
||
|
|
if !permissions.contains(&required_permission) {
|
||
|
|
return Err(RbacError::PermissionDenied);
|
||
|
|
}
|
||
|
|
```
|
||
|
|
|
||
|
|
**Security Features**:
|
||
|
|
- Role-based permissions
|
||
|
|
- Optional audit logging
|
||
|
|
- Session management
|
||
|
|
|
||
|
|
#### CICD Mode
|
||
|
|
```rust
|
||
|
|
// CICD: Service account focused
|
||
|
|
// All actions logged for automation tracking
|
||
|
|
if mode == ExecutionMode::CICD {
|
||
|
|
audit_log.log_automation_action(service_account, action).await;
|
||
|
|
}
|
||
|
|
```
|
||
|
|
|
||
|
|
**Security Features**:
|
||
|
|
- Service account isolation
|
||
|
|
- Mandatory audit logging
|
||
|
|
- Token-based authentication
|
||
|
|
- Short-lived credentials
|
||
|
|
|
||
|
|
#### Enterprise Mode
|
||
|
|
```rust
|
||
|
|
// Enterprise: Full security
|
||
|
|
// - Mandatory audit logging
|
||
|
|
// - Stricter session timeouts
|
||
|
|
// - Compliance reports
|
||
|
|
if mode == ExecutionMode::Enterprise {
|
||
|
|
audit_log.log_with_compliance(user, action, compliance_tags).await;
|
||
|
|
}
|
||
|
|
```
|
||
|
|
|
||
|
|
**Security Features**:
|
||
|
|
- Full RBAC enforcement
|
||
|
|
- Comprehensive audit logging
|
||
|
|
- Compliance reporting
|
||
|
|
- Role assignment approval workflow
|
||
|
|
|
||
|
|
### 2.2 Permission System
|
||
|
|
|
||
|
|
**Permission Levels**:
|
||
|
|
```rust
|
||
|
|
Role::Admin => 100 // Full access
|
||
|
|
Role::Operator => 80 // Deploy & manage
|
||
|
|
Role::Developer => 60 // Read + dev deploy
|
||
|
|
Role::ServiceAccount => 50 // Automation
|
||
|
|
Role::Auditor => 40 // Read + audit
|
||
|
|
Role::Viewer => 20 // Read-only
|
||
|
|
```
|
||
|
|
|
||
|
|
**Action Security Levels**:
|
||
|
|
```rust
|
||
|
|
Action::Delete => 100 // Destructive, admin only
|
||
|
|
Action::Manage => 80 // Service management
|
||
|
|
Action::Deploy => 80 // Deploy to production
|
||
|
|
Action::Create => 60 // Create resources
|
||
|
|
Action::Update => 60 // Modify resources
|
||
|
|
Action::Execute => 50 // Execute operations
|
||
|
|
Action::Audit => 40 // View audit logs
|
||
|
|
Action::Read => 20 // View resources
|
||
|
|
```
|
||
|
|
|
||
|
|
**Permission Check**:
|
||
|
|
```rust
|
||
|
|
pub fn can_perform(&self, required_level: u8) -> bool {
|
||
|
|
self.permission_level() >= required_level
|
||
|
|
}
|
||
|
|
```
|
||
|
|
|
||
|
|
**Security Guarantees**:
|
||
|
|
- ✅ Least privilege by default (Viewer role)
|
||
|
|
- ✅ Hierarchical permissions (higher roles include lower)
|
||
|
|
- ✅ Explicit deny for unknown resources
|
||
|
|
- ✅ No permission escalation without admin
|
||
|
|
|
||
|
|
### 2.3 Session Security
|
||
|
|
|
||
|
|
**Session Configuration**:
|
||
|
|
```toml
|
||
|
|
[security]
|
||
|
|
session_timeout_minutes = 60 # Solo/MultiUser
|
||
|
|
session_timeout_minutes = 30 # Enterprise
|
||
|
|
max_sessions_per_user = 5
|
||
|
|
failed_login_lockout_attempts = 5
|
||
|
|
failed_login_lockout_duration_minutes = 15
|
||
|
|
```
|
||
|
|
|
||
|
|
**Session Lifecycle**:
|
||
|
|
1. User authenticates → JWT token issued
|
||
|
|
2. Token includes: user_id, role, issued_at, expires_at
|
||
|
|
3. Middleware validates token on each request
|
||
|
|
4. Session tracked in Redis/RocksDB
|
||
|
|
5. Session invalidated on logout or timeout
|
||
|
|
|
||
|
|
**Security Features**:
|
||
|
|
- JWT with RSA-2048 signatures
|
||
|
|
- Refresh token rotation
|
||
|
|
- Session fixation prevention
|
||
|
|
- Concurrent session limits
|
||
|
|
|
||
|
|
**Threat Mitigation**:
|
||
|
|
- ✅ **Session Hijacking**: Short-lived tokens (1 hour)
|
||
|
|
- ✅ **Token Replay**: One-time refresh tokens
|
||
|
|
- ✅ **Brute Force**: Account lockout after 5 failures
|
||
|
|
- ✅ **Session Fixation**: New session ID on login
|
||
|
|
|
||
|
|
### 2.4 Middleware Security
|
||
|
|
|
||
|
|
**RBAC Middleware Flow**:
|
||
|
|
```
|
||
|
|
Request → Auth Middleware → RBAC Middleware → Handler
|
||
|
|
↓ ↓
|
||
|
|
Extract User Check Permission
|
||
|
|
from JWT Token (role + resource + action)
|
||
|
|
↓
|
||
|
|
Allow / Deny
|
||
|
|
```
|
||
|
|
|
||
|
|
**Middleware Implementation**:
|
||
|
|
```rust
|
||
|
|
pub async fn check_permission(
|
||
|
|
State(state): State<Arc<RbacMiddleware>>,
|
||
|
|
resource: Resource,
|
||
|
|
action: Action,
|
||
|
|
mut req: Request,
|
||
|
|
next: Next,
|
||
|
|
) -> Result<Response, RbacError> {
|
||
|
|
let user = req.extensions()
|
||
|
|
.get::<User>()
|
||
|
|
.ok_or(RbacError::UserNotFound("No user in request".to_string()))?;
|
||
|
|
|
||
|
|
if !state.rbac_manager.check_permission(&user, resource, action).await {
|
||
|
|
return Err(RbacError::PermissionDenied);
|
||
|
|
}
|
||
|
|
|
||
|
|
Ok(next.run(req).await)
|
||
|
|
}
|
||
|
|
```
|
||
|
|
|
||
|
|
**Security Guarantees**:
|
||
|
|
- ✅ All API endpoints protected by default
|
||
|
|
- ✅ Permission checked before handler execution
|
||
|
|
- ✅ User context available in handlers
|
||
|
|
- ✅ Failed checks logged for audit
|
||
|
|
|
||
|
|
## 3. Platform Monitoring Security
|
||
|
|
|
||
|
|
### 3.1 Service Access Security
|
||
|
|
|
||
|
|
**Internal URLs Only**:
|
||
|
|
```toml
|
||
|
|
[platform]
|
||
|
|
orchestrator_url = "http://localhost:8080" # Not exposed externally
|
||
|
|
coredns_url = "http://localhost:9153"
|
||
|
|
gitea_url = "http://localhost:3000"
|
||
|
|
oci_registry_url = "http://localhost:5000"
|
||
|
|
```
|
||
|
|
|
||
|
|
**Network Security**:
|
||
|
|
- All services on localhost or internal network
|
||
|
|
- No external exposure of monitoring endpoints
|
||
|
|
- Firewall rules to prevent external access
|
||
|
|
|
||
|
|
**Threat Mitigation**:
|
||
|
|
- ✅ **External Scanning**: Services not reachable from internet
|
||
|
|
- ✅ **DDoS**: Internal-only access limits attack surface
|
||
|
|
- ✅ **Data Exfiltration**: Monitoring data not exposed externally
|
||
|
|
|
||
|
|
### 3.2 Health Check Security
|
||
|
|
|
||
|
|
**Timeout Protection**:
|
||
|
|
```rust
|
||
|
|
let client = Client::builder()
|
||
|
|
.timeout(std::time::Duration::from_secs(5)) // Prevent hanging
|
||
|
|
.build()
|
||
|
|
.unwrap();
|
||
|
|
```
|
||
|
|
|
||
|
|
**Error Handling**:
|
||
|
|
```rust
|
||
|
|
// Never expose internal errors to users
|
||
|
|
Err(e) => {
|
||
|
|
// Log detailed error internally
|
||
|
|
tracing::error!("Health check failed for {}: {}", service, e);
|
||
|
|
|
||
|
|
// Return generic error externally
|
||
|
|
ServiceStatus {
|
||
|
|
status: HealthStatus::Unhealthy,
|
||
|
|
error_message: Some("Service unavailable".to_string()), // Generic
|
||
|
|
..
|
||
|
|
}
|
||
|
|
}
|
||
|
|
```
|
||
|
|
|
||
|
|
**Threat Mitigation**:
|
||
|
|
- ✅ **Timeout Attacks**: 5-second timeout prevents resource exhaustion
|
||
|
|
- ✅ **Information Disclosure**: Error messages sanitized
|
||
|
|
- ✅ **Resource Exhaustion**: Parallel checks with concurrency limits
|
||
|
|
|
||
|
|
### 3.3 Service Control Security
|
||
|
|
|
||
|
|
**RBAC-Protected Service Control**:
|
||
|
|
```rust
|
||
|
|
// Only Operator or Admin can start/stop services
|
||
|
|
#[axum::debug_handler]
|
||
|
|
pub async fn start_service(
|
||
|
|
State(state): State<AppState>,
|
||
|
|
Extension(user): Extension<User>,
|
||
|
|
Path(service_type): Path<String>,
|
||
|
|
) -> Result<StatusCode, ApiError> {
|
||
|
|
// Check permission
|
||
|
|
if !rbac_manager.check_permission(
|
||
|
|
&user,
|
||
|
|
Resource::Service,
|
||
|
|
Action::Manage,
|
||
|
|
).await {
|
||
|
|
return Err(ApiError::PermissionDenied);
|
||
|
|
}
|
||
|
|
|
||
|
|
// Start service
|
||
|
|
service_manager.start_service(&service_type).await?;
|
||
|
|
|
||
|
|
// Audit log
|
||
|
|
audit_log.log_service_action(user, service_type, "start").await;
|
||
|
|
|
||
|
|
Ok(StatusCode::OK)
|
||
|
|
}
|
||
|
|
```
|
||
|
|
|
||
|
|
**Security Guarantees**:
|
||
|
|
- ✅ Only authorized users can control services
|
||
|
|
- ✅ All service actions logged
|
||
|
|
- ✅ Graceful degradation on service failure
|
||
|
|
|
||
|
|
## 4. Threat Model
|
||
|
|
|
||
|
|
### 4.1 High-Risk Threats
|
||
|
|
|
||
|
|
#### Threat: SSH Private Key Exposure
|
||
|
|
**Attack Vector**: Attacker gains access to KMS database
|
||
|
|
|
||
|
|
**Mitigations**:
|
||
|
|
- Private keys encrypted at rest with master key
|
||
|
|
- Master key stored in hardware security module (HSM) or KMS
|
||
|
|
- Key access audited and rate-limited
|
||
|
|
- Zeroization of decrypted keys in memory
|
||
|
|
|
||
|
|
**Detection**:
|
||
|
|
- Audit log monitoring for unusual key access patterns
|
||
|
|
- Alerting on bulk key retrievals
|
||
|
|
|
||
|
|
#### Threat: Privilege Escalation
|
||
|
|
**Attack Vector**: Lower-privileged user attempts to gain admin access
|
||
|
|
|
||
|
|
**Mitigations**:
|
||
|
|
- Role assignment requires Admin role
|
||
|
|
- Mode switching requires Admin role
|
||
|
|
- Middleware enforces permissions on every request
|
||
|
|
- No client-side permission checks (server-side only)
|
||
|
|
|
||
|
|
**Detection**:
|
||
|
|
- Failed permission checks logged
|
||
|
|
- Alerting on repeated permission denials
|
||
|
|
|
||
|
|
#### Threat: Session Hijacking
|
||
|
|
**Attack Vector**: Attacker steals JWT token
|
||
|
|
|
||
|
|
**Mitigations**:
|
||
|
|
- Short-lived access tokens (1 hour)
|
||
|
|
- Refresh token rotation
|
||
|
|
- Secure HTTP-only cookies (recommended)
|
||
|
|
- IP address binding (optional)
|
||
|
|
|
||
|
|
**Detection**:
|
||
|
|
- Unusual login locations
|
||
|
|
- Concurrent sessions from different IPs
|
||
|
|
|
||
|
|
### 4.2 Medium-Risk Threats
|
||
|
|
|
||
|
|
#### Threat: Service Impersonation
|
||
|
|
**Attack Vector**: Malicious service pretends to be legitimate platform service
|
||
|
|
|
||
|
|
**Mitigations**:
|
||
|
|
- Service URLs configured in config file (not dynamic)
|
||
|
|
- TLS certificate validation (if HTTPS)
|
||
|
|
- Service authentication tokens
|
||
|
|
|
||
|
|
**Detection**:
|
||
|
|
- Health check failures
|
||
|
|
- Metrics anomalies
|
||
|
|
|
||
|
|
#### Threat: Audit Log Tampering
|
||
|
|
**Attack Vector**: Attacker modifies audit logs to hide tracks
|
||
|
|
|
||
|
|
**Mitigations**:
|
||
|
|
- Audit logs write-only
|
||
|
|
- Logs stored in tamper-evident database (SurrealDB)
|
||
|
|
- Hash chain for log integrity
|
||
|
|
- Offsite log backup
|
||
|
|
|
||
|
|
**Detection**:
|
||
|
|
- Hash chain verification
|
||
|
|
- Log gap detection
|
||
|
|
|
||
|
|
### 4.3 Low-Risk Threats
|
||
|
|
|
||
|
|
#### Threat: Information Disclosure via Error Messages
|
||
|
|
**Attack Vector**: Error messages leak internal information
|
||
|
|
|
||
|
|
**Mitigations**:
|
||
|
|
- Generic error messages for users
|
||
|
|
- Detailed errors only in server logs
|
||
|
|
- Error message sanitization
|
||
|
|
|
||
|
|
**Detection**:
|
||
|
|
- Code review for error handling
|
||
|
|
- Automated scanning for sensitive data in responses
|
||
|
|
|
||
|
|
## 5. Compliance Considerations
|
||
|
|
|
||
|
|
### 5.1 GDPR Compliance
|
||
|
|
|
||
|
|
**Personal Data Handling**:
|
||
|
|
- User information: username, email, IP addresses
|
||
|
|
- Retention: Audit logs kept for required period
|
||
|
|
- Right to erasure: User deletion deletes all associated data
|
||
|
|
|
||
|
|
**Implementation**:
|
||
|
|
```rust
|
||
|
|
// Delete user and all associated data
|
||
|
|
pub async fn delete_user(&self, user_id: &str) -> Result<(), RbacError> {
|
||
|
|
// Delete user SSH keys
|
||
|
|
for key in self.list_user_ssh_keys(user_id).await? {
|
||
|
|
self.delete_ssh_key(&key.key_id).await?;
|
||
|
|
}
|
||
|
|
|
||
|
|
// Anonymize audit logs (retain for compliance, remove PII)
|
||
|
|
self.anonymize_user_audit_logs(user_id).await?;
|
||
|
|
|
||
|
|
// Delete user record
|
||
|
|
self.delete_user_record(user_id).await?;
|
||
|
|
|
||
|
|
Ok(())
|
||
|
|
}
|
||
|
|
```
|
||
|
|
|
||
|
|
### 5.2 SOC 2 Compliance
|
||
|
|
|
||
|
|
**Security Controls**:
|
||
|
|
- ✅ Access control (RBAC)
|
||
|
|
- ✅ Audit logging (all actions logged)
|
||
|
|
- ✅ Encryption at rest (KMS)
|
||
|
|
- ✅ Encryption in transit (HTTPS recommended)
|
||
|
|
- ✅ Session management (timeout, MFA support)
|
||
|
|
|
||
|
|
**Monitoring & Alerting**:
|
||
|
|
- ✅ Service health monitoring
|
||
|
|
- ✅ Failed login tracking
|
||
|
|
- ✅ Permission denial alerting
|
||
|
|
- ✅ Unusual activity detection
|
||
|
|
|
||
|
|
### 5.3 PCI DSS (if applicable)
|
||
|
|
|
||
|
|
**Requirements**:
|
||
|
|
- ✅ Encrypt cardholder data (use KMS for keys)
|
||
|
|
- ✅ Maintain access control (RBAC)
|
||
|
|
- ✅ Track and monitor access (audit logs)
|
||
|
|
- ✅ Regularly test security (integration tests)
|
||
|
|
|
||
|
|
## 6. Security Best Practices
|
||
|
|
|
||
|
|
### 6.1 Development
|
||
|
|
|
||
|
|
**Code Review Checklist**:
|
||
|
|
- [ ] All API endpoints have RBAC middleware
|
||
|
|
- [ ] No hardcoded secrets or keys
|
||
|
|
- [ ] Error messages don't leak sensitive info
|
||
|
|
- [ ] Audit logging for sensitive operations
|
||
|
|
- [ ] Input validation on all user inputs
|
||
|
|
- [ ] SQL injection prevention (use parameterized queries)
|
||
|
|
- [ ] XSS prevention (escape user inputs)
|
||
|
|
|
||
|
|
**Testing**:
|
||
|
|
- Unit tests for permission checks
|
||
|
|
- Integration tests for RBAC enforcement
|
||
|
|
- Penetration testing for production deployments
|
||
|
|
|
||
|
|
### 6.2 Deployment
|
||
|
|
|
||
|
|
**Production Checklist**:
|
||
|
|
- [ ] Change default admin password
|
||
|
|
- [ ] Enable HTTPS with valid certificate
|
||
|
|
- [ ] Configure firewall rules (internal services only)
|
||
|
|
- [ ] Set appropriate execution mode (Enterprise for production)
|
||
|
|
- [ ] Enable audit logging
|
||
|
|
- [ ] Configure session timeout (30 minutes for Enterprise)
|
||
|
|
- [ ] Enable rate limiting
|
||
|
|
- [ ] Set up log monitoring and alerting
|
||
|
|
- [ ] Regular security updates
|
||
|
|
- [ ] Backup encryption keys
|
||
|
|
|
||
|
|
### 6.3 Operations
|
||
|
|
|
||
|
|
**Incident Response**:
|
||
|
|
1. **Detection**: Monitor audit logs for anomalies
|
||
|
|
2. **Containment**: Revoke compromised credentials
|
||
|
|
3. **Eradication**: Rotate affected SSH keys
|
||
|
|
4. **Recovery**: Restore from backup if needed
|
||
|
|
5. **Lessons Learned**: Update security controls
|
||
|
|
|
||
|
|
**Key Rotation Schedule**:
|
||
|
|
- SSH keys: Every 90 days (Enterprise: 30 days)
|
||
|
|
- JWT signing keys: Every 180 days
|
||
|
|
- Master encryption key: Every 365 days
|
||
|
|
- Service account tokens: Every 30 days
|
||
|
|
|
||
|
|
## 7. Security Metrics
|
||
|
|
|
||
|
|
### 7.1 Monitoring Metrics
|
||
|
|
|
||
|
|
**Authentication**:
|
||
|
|
- Failed login attempts per user
|
||
|
|
- Concurrent sessions per user
|
||
|
|
- Session duration (average, p95, p99)
|
||
|
|
|
||
|
|
**Authorization**:
|
||
|
|
- Permission denials per user
|
||
|
|
- Permission denials per resource
|
||
|
|
- Role assignments per day
|
||
|
|
|
||
|
|
**Audit**:
|
||
|
|
- SSH key accesses per day
|
||
|
|
- SSH key rotations per month
|
||
|
|
- Audit log retention compliance
|
||
|
|
|
||
|
|
**Services**:
|
||
|
|
- Service health check success rate
|
||
|
|
- Service response times (p50, p95, p99)
|
||
|
|
- Service dependency failures
|
||
|
|
|
||
|
|
### 7.2 Alerting Thresholds
|
||
|
|
|
||
|
|
**Critical Alerts**:
|
||
|
|
- Service health: >3 failures in 5 minutes
|
||
|
|
- Failed logins: >10 attempts in 1 minute
|
||
|
|
- Permission denials: >50 in 1 minute
|
||
|
|
- SSH key bulk retrieval: >10 keys in 1 minute
|
||
|
|
|
||
|
|
**Warning Alerts**:
|
||
|
|
- Service degraded: response time >1 second
|
||
|
|
- Session timeout rate: >10% of sessions
|
||
|
|
- Audit log storage: >80% capacity
|
||
|
|
|
||
|
|
## 8. Security Roadmap
|
||
|
|
|
||
|
|
### Phase 1 (Completed)
|
||
|
|
- ✅ SSH key storage with encryption
|
||
|
|
- ✅ Mode-based RBAC
|
||
|
|
- ✅ Audit logging
|
||
|
|
- ✅ Platform monitoring
|
||
|
|
|
||
|
|
### Phase 2 (In Progress)
|
||
|
|
- 📋 API handlers with RBAC enforcement
|
||
|
|
- 📋 Integration tests for security
|
||
|
|
- 📋 Documentation
|
||
|
|
|
||
|
|
### Phase 3 (Future)
|
||
|
|
- Multi-factor authentication (MFA)
|
||
|
|
- Hardware security module (HSM) integration
|
||
|
|
- Advanced threat detection (ML-based)
|
||
|
|
- Automated security scanning
|
||
|
|
- Compliance report generation
|
||
|
|
- Security information and event management (SIEM) integration
|
||
|
|
|
||
|
|
## References
|
||
|
|
|
||
|
|
- **OWASP Top 10**: https://owasp.org/www-project-top-ten/
|
||
|
|
- **NIST Cybersecurity Framework**: https://www.nist.gov/cyberframework
|
||
|
|
- **CIS Controls**: https://www.cisecurity.org/controls
|
||
|
|
- **GDPR**: https://gdpr.eu/
|
||
|
|
- **SOC 2**: https://www.aicpa.org/interestareas/frc/assuranceadvisoryservices/socforserviceorganizations.html
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
**Last Updated**: 2025-10-06
|
||
|
|
**Review Cycle**: Quarterly
|
||
|
|
**Next Review**: 2026-01-06
|