Jesús Pérez 09a97ac8f5

chore: update platform submodule to monorepo crates structure

Platform restructured into crates/, added AI service and detector,
       migrated control-center-ui to Leptos 0.8

2026-01-08 21:32:59 +00:00

17 KiB

Raw Blame History

Security Considerations for Control Center Enhancements

Overview

This document outlines the security architecture and considerations for the control-center enhancements, including KMS SSH key management, mode-based RBAC, and platform service monitoring.

1. SSH Key Management Security

1.1 Key Storage Security

Implementation:

Private keys encrypted at rest using AES-256-GCM in KMS
Public keys stored in plaintext (as they are meant to be public)
Private key material never exposed in API responses
Key IDs used as references, not actual keys

Threat Mitigation:

✅ Data at Rest: All private keys encrypted with master encryption key
✅ Key Exposure: Private keys only decrypted in memory when needed
✅ Key Leakage: Zeroization of key material after use
✅ Unauthorized Access: KMS access controlled by RBAC

Best Practices:

// Good: Using key ID reference
let key_id = ssh_key_manager.store_ssh_key(name, private, public, purpose, tags).await?;

// Bad: Never do this - exposing private key in logs
tracing::info!("Stored key: {}", private_key);  // DON'T DO THIS
```plaintext

### 1.2 Key Rotation Security

**Implementation**:

- Configurable rotation intervals (default 90 days)
- Grace period for old key usage (default 7 days)
- Automatic rotation scheduling (if enabled)
- Manual rotation support with immediate effect

**Threat Mitigation**:

- ✅ **Key Compromise**: Regular rotation limits exposure window
- ✅ **Stale Keys**: Automated detection of keys due for rotation
- ✅ **Rotation Failures**: Graceful degradation with error logging

**Rotation Policy**:

```toml
[kms.ssh_keys]
rotation_enabled = true
rotation_interval_days = 90   # Enterprise: 30, Dev: 180
grace_period_days = 7          # Time to update deployed keys
auto_rotate = false            # Manual approval recommended
```plaintext

### 1.3 Audit Logging

**Logged Events**:

- SSH key creation (who, when, purpose)
- SSH key retrieval (who accessed, when)
- SSH key rotation (old key ID, new key ID)
- SSH key deletion (who deleted, when)
- Failed access attempts

**Audit Entry Structure**:

```rust
pub struct SshKeyAuditEntry {
    pub timestamp: DateTime<Utc>,
    pub key_id: String,
    pub action: SshKeyAction,
    pub user: Option<String>,      // User who performed action
    pub ip_address: Option<String>, // Source IP
    pub success: bool,
    pub error_message: Option<String>,
}
```plaintext

**Threat Mitigation**:

- ✅ **Unauthorized Access**: Full audit trail for forensics
- ✅ **Insider Threats**: User attribution for all actions
- ✅ **Compliance**: GDPR/SOC2 audit log requirements met

**Audit Log Retention**:

- In-memory: Last 10,000 entries
- Persistent: SurrealDB with 1-year retention
- Compliance mode: 7-year retention (configurable)

### 1.4 Key Fingerprinting

**Implementation**:

```rust
fn calculate_fingerprint(public_key: &[u8]) -> Result<String, KmsError> {
    use sha2::{Sha256, Digest};
    let mut hasher = Sha256::new();
    hasher.update(public_key);
    let result = hasher.finalize();
    Ok(format!("SHA256:{}", base64::encode(&result[..16])))
}
```plaintext

**Security Benefits**:

- Verify key integrity
- Detect key tampering
- Match deployed keys to KMS records

## 2. RBAC Security

### 2.1 Execution Modes

**Security Model by Mode**:

| Mode | Security Level | Use Case | Audit Required |
|------|---------------|----------|----------------|
| Solo | Low | Single developer | No |
| MultiUser | Medium | Small teams | Optional |
| CICD | Medium | Automation | Yes |
| Enterprise | High | Production | Mandatory |

**Mode-Specific Security**:

#### Solo Mode

```rust
// Solo mode: All users are admin
// Security: Trust-based, no RBAC checks
if mode == ExecutionMode::Solo {
    return true;  // Allow all operations
}
```plaintext

**Risks**:

- No access control
- No audit trail
- Single point of failure

**Mitigations**:

- Only for development environments
- Network isolation required
- Regular backups

#### MultiUser Mode

```rust
// Multi-user: Role-based access control
let permissions = rbac_manager.get_user_permissions(&user).await;
if !permissions.contains(&required_permission) {
    return Err(RbacError::PermissionDenied);
}
```plaintext

**Security Features**:

- Role-based permissions
- Optional audit logging
- Session management

#### CICD Mode

```rust
// CICD: Service account focused
// All actions logged for automation tracking
if mode == ExecutionMode::CICD {
    audit_log.log_automation_action(service_account, action).await;
}
```plaintext

**Security Features**:

- Service account isolation
- Mandatory audit logging
- Token-based authentication
- Short-lived credentials

#### Enterprise Mode

```rust
// Enterprise: Full security
// - Mandatory audit logging
// - Stricter session timeouts
// - Compliance reports
if mode == ExecutionMode::Enterprise {
    audit_log.log_with_compliance(user, action, compliance_tags).await;
}
```plaintext

**Security Features**:

- Full RBAC enforcement
- Comprehensive audit logging
- Compliance reporting
- Role assignment approval workflow

### 2.2 Permission System

**Permission Levels**:

```rust
Role::Admin         => 100  // Full access
Role::Operator      =>  80  // Deploy & manage
Role::Developer     =>  60  // Read + dev deploy
Role::ServiceAccount =>  50  // Automation
Role::Auditor       =>  40  // Read + audit
Role::Viewer        =>  20  // Read-only
```plaintext

**Action Security Levels**:

```rust
Action::Delete      => 100  // Destructive, admin only
Action::Manage      =>  80  // Service management
Action::Deploy      =>  80  // Deploy to production
Action::Create      =>  60  // Create resources
Action::Update      =>  60  // Modify resources
Action::Execute     =>  50  // Execute operations
Action::Audit       =>  40  // View audit logs
Action::Read        =>  20  // View resources
```plaintext

**Permission Check**:

```rust
pub fn can_perform(&self, required_level: u8) -> bool {
    self.permission_level() >= required_level
}
```plaintext

**Security Guarantees**:

- ✅ Least privilege by default (Viewer role)
- ✅ Hierarchical permissions (higher roles include lower)
- ✅ Explicit deny for unknown resources
- ✅ No permission escalation without admin

### 2.3 Session Security

**Session Configuration**:

```toml
[security]
session_timeout_minutes = 60     # Solo/MultiUser
session_timeout_minutes = 30     # Enterprise
max_sessions_per_user = 5
failed_login_lockout_attempts = 5
failed_login_lockout_duration_minutes = 15
```plaintext

**Session Lifecycle**:

1. User authenticates → JWT token issued
2. Token includes: user_id, role, issued_at, expires_at
3. Middleware validates token on each request
4. Session tracked in Redis/RocksDB
5. Session invalidated on logout or timeout

**Security Features**:

- JWT with RSA-2048 signatures
- Refresh token rotation
- Session fixation prevention
- Concurrent session limits

**Threat Mitigation**:

- ✅ **Session Hijacking**: Short-lived tokens (1 hour)
- ✅ **Token Replay**: One-time refresh tokens
- ✅ **Brute Force**: Account lockout after 5 failures
- ✅ **Session Fixation**: New session ID on login

### 2.4 Middleware Security

**RBAC Middleware Flow**:

```plaintext
Request → Auth Middleware → RBAC Middleware → Handler
            ↓                    ↓
        Extract User      Check Permission
        from JWT Token    (role + resource + action)
                               ↓
                         Allow / Deny
```plaintext

**Middleware Implementation**:

```rust
pub async fn check_permission(
    State(state): State<Arc<RbacMiddleware>>,
    resource: Resource,
    action: Action,
    mut req: Request,
    next: Next,
) -> Result<Response, RbacError> {
    let user = req.extensions()
        .get::<User>()
        .ok_or(RbacError::UserNotFound("No user in request".to_string()))?;

    if !state.rbac_manager.check_permission(&user, resource, action).await {
        return Err(RbacError::PermissionDenied);
    }

    Ok(next.run(req).await)
}
```plaintext

**Security Guarantees**:

- ✅ All API endpoints protected by default
- ✅ Permission checked before handler execution
- ✅ User context available in handlers
- ✅ Failed checks logged for audit

## 3. Platform Monitoring Security

### 3.1 Service Access Security

**Internal URLs Only**:

```toml
[platform]
orchestrator_url = "http://localhost:9090"       # Not exposed externally
coredns_url = "http://localhost:9153"
gitea_url = "http://localhost:3000"
oci_registry_url = "http://localhost:5000"
```plaintext

**Network Security**:

- All services on localhost or internal network
- No external exposure of monitoring endpoints
- Firewall rules to prevent external access

**Threat Mitigation**:

- ✅ **External Scanning**: Services not reachable from internet
- ✅ **DDoS**: Internal-only access limits attack surface
- ✅ **Data Exfiltration**: Monitoring data not exposed externally

### 3.2 Health Check Security

**Timeout Protection**:

```rust
let client = Client::builder()
    .timeout(std::time::Duration::from_secs(5))  // Prevent hanging
    .build()
    .unwrap();
```plaintext

**Error Handling**:

```rust
// Never expose internal errors to users
Err(e) => {
    // Log detailed error internally
    tracing::error!("Health check failed for {}: {}", service, e);

    // Return generic error externally
    ServiceStatus {
        status: HealthStatus::Unhealthy,
        error_message: Some("Service unavailable".to_string()),  // Generic
        ..
    }
}
```plaintext

**Threat Mitigation**:

- ✅ **Timeout Attacks**: 5-second timeout prevents resource exhaustion
- ✅ **Information Disclosure**: Error messages sanitized
- ✅ **Resource Exhaustion**: Parallel checks with concurrency limits

### 3.3 Service Control Security

**RBAC-Protected Service Control**:

```rust
// Only Operator or Admin can start/stop services
#[axum::debug_handler]
pub async fn start_service(
    State(state): State<AppState>,
    Extension(user): Extension<User>,
    Path(service_type): Path<String>,
) -> Result<StatusCode, ApiError> {
    // Check permission
    if !rbac_manager.check_permission(
        &user,
        Resource::Service,
        Action::Manage,
    ).await {
        return Err(ApiError::PermissionDenied);
    }

    // Start service
    service_manager.start_service(&service_type).await?;

    // Audit log
    audit_log.log_service_action(user, service_type, "start").await;

    Ok(StatusCode::OK)
}
```plaintext

**Security Guarantees**:

- ✅ Only authorized users can control services
- ✅ All service actions logged
- ✅ Graceful degradation on service failure

## 4. Threat Model

### 4.1 High-Risk Threats

#### Threat: SSH Private Key Exposure

**Attack Vector**: Attacker gains access to KMS database

**Mitigations**:

- Private keys encrypted at rest with master key
- Master key stored in hardware security module (HSM) or KMS
- Key access audited and rate-limited
- Zeroization of decrypted keys in memory

**Detection**:

- Audit log monitoring for unusual key access patterns
- Alerting on bulk key retrievals

#### Threat: Privilege Escalation

**Attack Vector**: Lower-privileged user attempts to gain admin access

**Mitigations**:

- Role assignment requires Admin role
- Mode switching requires Admin role
- Middleware enforces permissions on every request
- No client-side permission checks (server-side only)

**Detection**:

- Failed permission checks logged
- Alerting on repeated permission denials

#### Threat: Session Hijacking

**Attack Vector**: Attacker steals JWT token

**Mitigations**:

- Short-lived access tokens (1 hour)
- Refresh token rotation
- Secure HTTP-only cookies (recommended)
- IP address binding (optional)

**Detection**:

- Unusual login locations
- Concurrent sessions from different IPs

### 4.2 Medium-Risk Threats

#### Threat: Service Impersonation

**Attack Vector**: Malicious service pretends to be legitimate platform service

**Mitigations**:

- Service URLs configured in config file (not dynamic)
- TLS certificate validation (if HTTPS)
- Service authentication tokens

**Detection**:

- Health check failures
- Metrics anomalies

#### Threat: Audit Log Tampering

**Attack Vector**: Attacker modifies audit logs to hide tracks

**Mitigations**:

- Audit logs write-only
- Logs stored in tamper-evident database (SurrealDB)
- Hash chain for log integrity
- Offsite log backup

**Detection**:

- Hash chain verification
- Log gap detection

### 4.3 Low-Risk Threats

#### Threat: Information Disclosure via Error Messages

**Attack Vector**: Error messages leak internal information

**Mitigations**:

- Generic error messages for users
- Detailed errors only in server logs
- Error message sanitization

**Detection**:

- Code review for error handling
- Automated scanning for sensitive data in responses

## 5. Compliance Considerations

### 5.1 GDPR Compliance

**Personal Data Handling**:

- User information: username, email, IP addresses
- Retention: Audit logs kept for required period
- Right to erasure: User deletion deletes all associated data

**Implementation**:

```rust
// Delete user and all associated data
pub async fn delete_user(&self, user_id: &str) -> Result<(), RbacError> {
    // Delete user SSH keys
    for key in self.list_user_ssh_keys(user_id).await? {
        self.delete_ssh_key(&key.key_id).await?;
    }

    // Anonymize audit logs (retain for compliance, remove PII)
    self.anonymize_user_audit_logs(user_id).await?;

    // Delete user record
    self.delete_user_record(user_id).await?;

    Ok(())
}
```plaintext

### 5.2 SOC 2 Compliance

**Security Controls**:

- ✅ Access control (RBAC)
- ✅ Audit logging (all actions logged)
- ✅ Encryption at rest (KMS)
- ✅ Encryption in transit (HTTPS recommended)
- ✅ Session management (timeout, MFA support)

**Monitoring & Alerting**:

- ✅ Service health monitoring
- ✅ Failed login tracking
- ✅ Permission denial alerting
- ✅ Unusual activity detection

### 5.3 PCI DSS (if applicable)

**Requirements**:

- ✅ Encrypt cardholder data (use KMS for keys)
- ✅ Maintain access control (RBAC)
- ✅ Track and monitor access (audit logs)
- ✅ Regularly test security (integration tests)

## 6. Security Best Practices

### 6.1 Development

**Code Review Checklist**:

- [ ] All API endpoints have RBAC middleware
- [ ] No hardcoded secrets or keys
- [ ] Error messages don't leak sensitive info
- [ ] Audit logging for sensitive operations
- [ ] Input validation on all user inputs
- [ ] SQL injection prevention (use parameterized queries)
- [ ] XSS prevention (escape user inputs)

**Testing**:

- Unit tests for permission checks
- Integration tests for RBAC enforcement
- Penetration testing for production deployments

### 6.2 Deployment

**Production Checklist**:

- [ ] Change default admin password
- [ ] Enable HTTPS with valid certificate
- [ ] Configure firewall rules (internal services only)
- [ ] Set appropriate execution mode (Enterprise for production)
- [ ] Enable audit logging
- [ ] Configure session timeout (30 minutes for Enterprise)
- [ ] Enable rate limiting
- [ ] Set up log monitoring and alerting
- [ ] Regular security updates
- [ ] Backup encryption keys

### 6.3 Operations

**Incident Response**:

1. **Detection**: Monitor audit logs for anomalies
2. **Containment**: Revoke compromised credentials
3. **Eradication**: Rotate affected SSH keys
4. **Recovery**: Restore from backup if needed
5. **Lessons Learned**: Update security controls

**Key Rotation Schedule**:

- SSH keys: Every 90 days (Enterprise: 30 days)
- JWT signing keys: Every 180 days
- Master encryption key: Every 365 days
- Service account tokens: Every 30 days

## 7. Security Metrics

### 7.1 Monitoring Metrics

**Authentication**:

- Failed login attempts per user
- Concurrent sessions per user
- Session duration (average, p95, p99)

**Authorization**:

- Permission denials per user
- Permission denials per resource
- Role assignments per day

**Audit**:

- SSH key accesses per day
- SSH key rotations per month
- Audit log retention compliance

**Services**:

- Service health check success rate
- Service response times (p50, p95, p99)
- Service dependency failures

### 7.2 Alerting Thresholds

**Critical Alerts**:

- Service health: >3 failures in 5 minutes
- Failed logins: >10 attempts in 1 minute
- Permission denials: >50 in 1 minute
- SSH key bulk retrieval: >10 keys in 1 minute

**Warning Alerts**:

- Service degraded: response time >1 second
- Session timeout rate: >10% of sessions
- Audit log storage: >80% capacity

## 8. Security Roadmap

### Phase 1 (Completed)

- ✅ SSH key storage with encryption
- ✅ Mode-based RBAC
- ✅ Audit logging
- ✅ Platform monitoring

### Phase 2 (In Progress)

- 📋 API handlers with RBAC enforcement
- 📋 Integration tests for security
- 📋 Documentation

### Phase 3 (Future)

- Multi-factor authentication (MFA)
- Hardware security module (HSM) integration
- Advanced threat detection (ML-based)
- Automated security scanning
- Compliance report generation
- Security information and event management (SIEM) integration

## References

- **OWASP Top 10**: <https://owasp.org/www-project-top-ten/>
- **NIST Cybersecurity Framework**: <https://www.nist.gov/cyberframework>
- **CIS Controls**: <https://www.cisecurity.org/controls>
- **GDPR**: <https://gdpr.eu/>
- **SOC 2**: <https://www.aicpa.org/interestareas/frc/assuranceadvisoryservices/socforserviceorganizations.html>

---

**Last Updated**: 2025-10-06
**Review Cycle**: Quarterly
**Next Review**: 2026-01-06

17 KiB Raw Blame History

Security Considerations for Control Center Enhancements

Overview

1. SSH Key Management Security

1.1 Key Storage Security

17 KiB

Raw Blame History