# Security Considerations for Control Center Enhancements

## Overview

This document outlines the security architecture and considerations for the control-center enhancements, including KMS SSH key management, mode-based RBAC, and platform service monitoring.

## 1. SSH Key Management Security

### 1.1 Key Storage Security

**Implementation**:
- Private keys encrypted at rest using AES-256-GCM in KMS
- Public keys stored in plaintext (as they are meant to be public)
- Private key material never exposed in API responses
- Key IDs used as references, not actual keys

**Threat Mitigation**:
- ✅ **Data at Rest**: All private keys encrypted with master encryption key
- ✅ **Key Exposure**: Private keys only decrypted in memory when needed
- ✅ **Key Leakage**: Zeroization of key material after use
- ✅ **Unauthorized Access**: KMS access controlled by RBAC

**Best Practices**:
```rust
// Good: Using key ID reference
let key_id = ssh_key_manager.store_ssh_key(name, private, public, purpose, tags).await?;

// Bad: Never do this - exposing private key in logs
tracing::info!("Stored key: {}", private_key);  // DON'T DO THIS
```

### 1.2 Key Rotation Security

**Implementation**:
- Configurable rotation intervals (default 90 days)
- Grace period for old key usage (default 7 days)
- Automatic rotation scheduling (if enabled)
- Manual rotation support with immediate effect

**Threat Mitigation**:
- ✅ **Key Compromise**: Regular rotation limits exposure window
- ✅ **Stale Keys**: Automated detection of keys due for rotation
- ✅ **Rotation Failures**: Graceful degradation with error logging

**Rotation Policy**:
```toml
[kms.ssh_keys]
rotation_enabled = true
rotation_interval_days = 90   # Enterprise: 30, Dev: 180
grace_period_days = 7          # Time to update deployed keys
auto_rotate = false            # Manual approval recommended
```

### 1.3 Audit Logging

**Logged Events**:
- SSH key creation (who, when, purpose)
- SSH key retrieval (who accessed, when)
- SSH key rotation (old key ID, new key ID)
- SSH key deletion (who deleted, when)
- Failed access attempts

**Audit Entry Structure**:
```rust
pub struct SshKeyAuditEntry {
    pub timestamp: DateTime<Utc>,
    pub key_id: String,
    pub action: SshKeyAction,
    pub user: Option<String>,      // User who performed action
    pub ip_address: Option<String>, // Source IP
    pub success: bool,
    pub error_message: Option<String>,
}
```

**Threat Mitigation**:
- ✅ **Unauthorized Access**: Full audit trail for forensics
- ✅ **Insider Threats**: User attribution for all actions
- ✅ **Compliance**: GDPR/SOC2 audit log requirements met

**Audit Log Retention**:
- In-memory: Last 10,000 entries
- Persistent: SurrealDB with 1-year retention
- Compliance mode: 7-year retention (configurable)

### 1.4 Key Fingerprinting

**Implementation**:
```rust
fn calculate_fingerprint(public_key: &[u8]) -> Result<String, KmsError> {
    use sha2::{Sha256, Digest};
    let mut hasher = Sha256::new();
    hasher.update(public_key);
    let result = hasher.finalize();
    Ok(format!("SHA256:{}", base64::encode(&result[..16])))
}
```

**Security Benefits**:
- Verify key integrity
- Detect key tampering
- Match deployed keys to KMS records

## 2. RBAC Security

### 2.1 Execution Modes

**Security Model by Mode**:

| Mode | Security Level | Use Case | Audit Required |
|------|---------------|----------|----------------|
| Solo | Low | Single developer | No |
| MultiUser | Medium | Small teams | Optional |
| CICD | Medium | Automation | Yes |
| Enterprise | High | Production | Mandatory |

**Mode-Specific Security**:

#### Solo Mode
```rust
// Solo mode: All users are admin
// Security: Trust-based, no RBAC checks
if mode == ExecutionMode::Solo {
    return true;  // Allow all operations
}
```

**Risks**:
- No access control
- No audit trail
- Single point of failure

**Mitigations**:
- Only for development environments
- Network isolation required
- Regular backups

#### MultiUser Mode
```rust
// Multi-user: Role-based access control
let permissions = rbac_manager.get_user_permissions(&user).await;
if !permissions.contains(&required_permission) {
    return Err(RbacError::PermissionDenied);
}
```

**Security Features**:
- Role-based permissions
- Optional audit logging
- Session management

#### CICD Mode
```rust
// CICD: Service account focused
// All actions logged for automation tracking
if mode == ExecutionMode::CICD {
    audit_log.log_automation_action(service_account, action).await;
}
```

**Security Features**:
- Service account isolation
- Mandatory audit logging
- Token-based authentication
- Short-lived credentials

#### Enterprise Mode
```rust
// Enterprise: Full security
// - Mandatory audit logging
// - Stricter session timeouts
// - Compliance reports
if mode == ExecutionMode::Enterprise {
    audit_log.log_with_compliance(user, action, compliance_tags).await;
}
```

**Security Features**:
- Full RBAC enforcement
- Comprehensive audit logging
- Compliance reporting
- Role assignment approval workflow

### 2.2 Permission System

**Permission Levels**:
```rust
Role::Admin         => 100  // Full access
Role::Operator      =>  80  // Deploy & manage
Role::Developer     =>  60  // Read + dev deploy
Role::ServiceAccount =>  50  // Automation
Role::Auditor       =>  40  // Read + audit
Role::Viewer        =>  20  // Read-only
```

**Action Security Levels**:
```rust
Action::Delete      => 100  // Destructive, admin only
Action::Manage      =>  80  // Service management
Action::Deploy      =>  80  // Deploy to production
Action::Create      =>  60  // Create resources
Action::Update      =>  60  // Modify resources
Action::Execute     =>  50  // Execute operations
Action::Audit       =>  40  // View audit logs
Action::Read        =>  20  // View resources
```

**Permission Check**:
```rust
pub fn can_perform(&self, required_level: u8) -> bool {
    self.permission_level() >= required_level
}
```

**Security Guarantees**:
- ✅ Least privilege by default (Viewer role)
- ✅ Hierarchical permissions (higher roles include lower)
- ✅ Explicit deny for unknown resources
- ✅ No permission escalation without admin

### 2.3 Session Security

**Session Configuration**:
```toml
[security]
session_timeout_minutes = 60     # Solo/MultiUser
session_timeout_minutes = 30     # Enterprise
max_sessions_per_user = 5
failed_login_lockout_attempts = 5
failed_login_lockout_duration_minutes = 15
```

**Session Lifecycle**:
1. User authenticates → JWT token issued
2. Token includes: user_id, role, issued_at, expires_at
3. Middleware validates token on each request
4. Session tracked in Redis/RocksDB
5. Session invalidated on logout or timeout

**Security Features**:
- JWT with RSA-2048 signatures
- Refresh token rotation
- Session fixation prevention
- Concurrent session limits

**Threat Mitigation**:
- ✅ **Session Hijacking**: Short-lived tokens (1 hour)
- ✅ **Token Replay**: One-time refresh tokens
- ✅ **Brute Force**: Account lockout after 5 failures
- ✅ **Session Fixation**: New session ID on login

### 2.4 Middleware Security

**RBAC Middleware Flow**:
```
Request → Auth Middleware → RBAC Middleware → Handler
            ↓                    ↓
        Extract User      Check Permission
        from JWT Token    (role + resource + action)
                               ↓
                         Allow / Deny
```

**Middleware Implementation**:
```rust
pub async fn check_permission(
    State(state): State<Arc<RbacMiddleware>>,
    resource: Resource,
    action: Action,
    mut req: Request,
    next: Next,
) -> Result<Response, RbacError> {
    let user = req.extensions()
        .get::<User>()
        .ok_or(RbacError::UserNotFound("No user in request".to_string()))?;

    if !state.rbac_manager.check_permission(&user, resource, action).await {
        return Err(RbacError::PermissionDenied);
    }

    Ok(next.run(req).await)
}
```

**Security Guarantees**:
- ✅ All API endpoints protected by default
- ✅ Permission checked before handler execution
- ✅ User context available in handlers
- ✅ Failed checks logged for audit

## 3. Platform Monitoring Security

### 3.1 Service Access Security

**Internal URLs Only**:
```toml
[platform]
orchestrator_url = "http://localhost:8080"       # Not exposed externally
coredns_url = "http://localhost:9153"
gitea_url = "http://localhost:3000"
oci_registry_url = "http://localhost:5000"
```

**Network Security**:
- All services on localhost or internal network
- No external exposure of monitoring endpoints
- Firewall rules to prevent external access

**Threat Mitigation**:
- ✅ **External Scanning**: Services not reachable from internet
- ✅ **DDoS**: Internal-only access limits attack surface
- ✅ **Data Exfiltration**: Monitoring data not exposed externally

### 3.2 Health Check Security

**Timeout Protection**:
```rust
let client = Client::builder()
    .timeout(std::time::Duration::from_secs(5))  // Prevent hanging
    .build()
    .unwrap();
```

**Error Handling**:
```rust
// Never expose internal errors to users
Err(e) => {
    // Log detailed error internally
    tracing::error!("Health check failed for {}: {}", service, e);

    // Return generic error externally
    ServiceStatus {
        status: HealthStatus::Unhealthy,
        error_message: Some("Service unavailable".to_string()),  // Generic
        ..
    }
}
```

**Threat Mitigation**:
- ✅ **Timeout Attacks**: 5-second timeout prevents resource exhaustion
- ✅ **Information Disclosure**: Error messages sanitized
- ✅ **Resource Exhaustion**: Parallel checks with concurrency limits

### 3.3 Service Control Security

**RBAC-Protected Service Control**:
```rust
// Only Operator or Admin can start/stop services
#[axum::debug_handler]
pub async fn start_service(
    State(state): State<AppState>,
    Extension(user): Extension<User>,
    Path(service_type): Path<String>,
) -> Result<StatusCode, ApiError> {
    // Check permission
    if !rbac_manager.check_permission(
        &user,
        Resource::Service,
        Action::Manage,
    ).await {
        return Err(ApiError::PermissionDenied);
    }

    // Start service
    service_manager.start_service(&service_type).await?;

    // Audit log
    audit_log.log_service_action(user, service_type, "start").await;

    Ok(StatusCode::OK)
}
```

**Security Guarantees**:
- ✅ Only authorized users can control services
- ✅ All service actions logged
- ✅ Graceful degradation on service failure

## 4. Threat Model

### 4.1 High-Risk Threats

#### Threat: SSH Private Key Exposure
**Attack Vector**: Attacker gains access to KMS database

**Mitigations**:
- Private keys encrypted at rest with master key
- Master key stored in hardware security module (HSM) or KMS
- Key access audited and rate-limited
- Zeroization of decrypted keys in memory

**Detection**:
- Audit log monitoring for unusual key access patterns
- Alerting on bulk key retrievals

#### Threat: Privilege Escalation
**Attack Vector**: Lower-privileged user attempts to gain admin access

**Mitigations**:
- Role assignment requires Admin role
- Mode switching requires Admin role
- Middleware enforces permissions on every request
- No client-side permission checks (server-side only)

**Detection**:
- Failed permission checks logged
- Alerting on repeated permission denials

#### Threat: Session Hijacking
**Attack Vector**: Attacker steals JWT token

**Mitigations**:
- Short-lived access tokens (1 hour)
- Refresh token rotation
- Secure HTTP-only cookies (recommended)
- IP address binding (optional)

**Detection**:
- Unusual login locations
- Concurrent sessions from different IPs

### 4.2 Medium-Risk Threats

#### Threat: Service Impersonation
**Attack Vector**: Malicious service pretends to be legitimate platform service

**Mitigations**:
- Service URLs configured in config file (not dynamic)
- TLS certificate validation (if HTTPS)
- Service authentication tokens

**Detection**:
- Health check failures
- Metrics anomalies

#### Threat: Audit Log Tampering
**Attack Vector**: Attacker modifies audit logs to hide tracks

**Mitigations**:
- Audit logs write-only
- Logs stored in tamper-evident database (SurrealDB)
- Hash chain for log integrity
- Offsite log backup

**Detection**:
- Hash chain verification
- Log gap detection

### 4.3 Low-Risk Threats

#### Threat: Information Disclosure via Error Messages
**Attack Vector**: Error messages leak internal information

**Mitigations**:
- Generic error messages for users
- Detailed errors only in server logs
- Error message sanitization

**Detection**:
- Code review for error handling
- Automated scanning for sensitive data in responses

## 5. Compliance Considerations

### 5.1 GDPR Compliance

**Personal Data Handling**:
- User information: username, email, IP addresses
- Retention: Audit logs kept for required period
- Right to erasure: User deletion deletes all associated data

**Implementation**:
```rust
// Delete user and all associated data
pub async fn delete_user(&self, user_id: &str) -> Result<(), RbacError> {
    // Delete user SSH keys
    for key in self.list_user_ssh_keys(user_id).await? {
        self.delete_ssh_key(&key.key_id).await?;
    }

    // Anonymize audit logs (retain for compliance, remove PII)
    self.anonymize_user_audit_logs(user_id).await?;

    // Delete user record
    self.delete_user_record(user_id).await?;

    Ok(())
}
```

### 5.2 SOC 2 Compliance

**Security Controls**:
- ✅ Access control (RBAC)
- ✅ Audit logging (all actions logged)
- ✅ Encryption at rest (KMS)
- ✅ Encryption in transit (HTTPS recommended)
- ✅ Session management (timeout, MFA support)

**Monitoring & Alerting**:
- ✅ Service health monitoring
- ✅ Failed login tracking
- ✅ Permission denial alerting
- ✅ Unusual activity detection

### 5.3 PCI DSS (if applicable)

**Requirements**:
- ✅ Encrypt cardholder data (use KMS for keys)
- ✅ Maintain access control (RBAC)
- ✅ Track and monitor access (audit logs)
- ✅ Regularly test security (integration tests)

## 6. Security Best Practices

### 6.1 Development

**Code Review Checklist**:
- [ ] All API endpoints have RBAC middleware
- [ ] No hardcoded secrets or keys
- [ ] Error messages don't leak sensitive info
- [ ] Audit logging for sensitive operations
- [ ] Input validation on all user inputs
- [ ] SQL injection prevention (use parameterized queries)
- [ ] XSS prevention (escape user inputs)

**Testing**:
- Unit tests for permission checks
- Integration tests for RBAC enforcement
- Penetration testing for production deployments

### 6.2 Deployment

**Production Checklist**:
- [ ] Change default admin password
- [ ] Enable HTTPS with valid certificate
- [ ] Configure firewall rules (internal services only)
- [ ] Set appropriate execution mode (Enterprise for production)
- [ ] Enable audit logging
- [ ] Configure session timeout (30 minutes for Enterprise)
- [ ] Enable rate limiting
- [ ] Set up log monitoring and alerting
- [ ] Regular security updates
- [ ] Backup encryption keys

### 6.3 Operations

**Incident Response**:
1. **Detection**: Monitor audit logs for anomalies
2. **Containment**: Revoke compromised credentials
3. **Eradication**: Rotate affected SSH keys
4. **Recovery**: Restore from backup if needed
5. **Lessons Learned**: Update security controls

**Key Rotation Schedule**:
- SSH keys: Every 90 days (Enterprise: 30 days)
- JWT signing keys: Every 180 days
- Master encryption key: Every 365 days
- Service account tokens: Every 30 days

## 7. Security Metrics

### 7.1 Monitoring Metrics

**Authentication**:
- Failed login attempts per user
- Concurrent sessions per user
- Session duration (average, p95, p99)

**Authorization**:
- Permission denials per user
- Permission denials per resource
- Role assignments per day

**Audit**:
- SSH key accesses per day
- SSH key rotations per month
- Audit log retention compliance

**Services**:
- Service health check success rate
- Service response times (p50, p95, p99)
- Service dependency failures

### 7.2 Alerting Thresholds

**Critical Alerts**:
- Service health: >3 failures in 5 minutes
- Failed logins: >10 attempts in 1 minute
- Permission denials: >50 in 1 minute
- SSH key bulk retrieval: >10 keys in 1 minute

**Warning Alerts**:
- Service degraded: response time >1 second
- Session timeout rate: >10% of sessions
- Audit log storage: >80% capacity

## 8. Security Roadmap

### Phase 1 (Completed)
- ✅ SSH key storage with encryption
- ✅ Mode-based RBAC
- ✅ Audit logging
- ✅ Platform monitoring

### Phase 2 (In Progress)
- 📋 API handlers with RBAC enforcement
- 📋 Integration tests for security
- 📋 Documentation

### Phase 3 (Future)
- Multi-factor authentication (MFA)
- Hardware security module (HSM) integration
- Advanced threat detection (ML-based)
- Automated security scanning
- Compliance report generation
- Security information and event management (SIEM) integration

## References

- **OWASP Top 10**: https://owasp.org/www-project-top-ten/
- **NIST Cybersecurity Framework**: https://www.nist.gov/cyberframework
- **CIS Controls**: https://www.cisecurity.org/controls
- **GDPR**: https://gdpr.eu/
- **SOC 2**: https://www.aicpa.org/interestareas/frc/assuranceadvisoryservices/socforserviceorganizations.html

---

**Last Updated**: 2025-10-06
**Review Cycle**: Quarterly
**Next Review**: 2026-01-06