prvng_platform/crates/control-center/docs/SECURITY_CONSIDERATIONS.md
Jesús Pérez 09a97ac8f5
chore: update platform submodule to monorepo crates structure
Platform restructured into crates/, added AI service and detector,
       migrated control-center-ui to Leptos 0.8
2026-01-08 21:32:59 +00:00

17 KiB

Security Considerations for Control Center Enhancements

Overview

This document outlines the security architecture and considerations for the control-center enhancements, including KMS SSH key management, mode-based RBAC, and platform service monitoring.

1. SSH Key Management Security

1.1 Key Storage Security

Implementation:

  • Private keys encrypted at rest using AES-256-GCM in KMS
  • Public keys stored in plaintext (as they are meant to be public)
  • Private key material never exposed in API responses
  • Key IDs used as references, not actual keys

Threat Mitigation:

  • Data at Rest: All private keys encrypted with master encryption key
  • Key Exposure: Private keys only decrypted in memory when needed
  • Key Leakage: Zeroization of key material after use
  • Unauthorized Access: KMS access controlled by RBAC

Best Practices:

// Good: Using key ID reference
let key_id = ssh_key_manager.store_ssh_key(name, private, public, purpose, tags).await?;

// Bad: Never do this - exposing private key in logs
tracing::info!("Stored key: {}", private_key);  // DON'T DO THIS
```plaintext

### 1.2 Key Rotation Security

**Implementation**:

- Configurable rotation intervals (default 90 days)
- Grace period for old key usage (default 7 days)
- Automatic rotation scheduling (if enabled)
- Manual rotation support with immediate effect

**Threat Mitigation**:

-  **Key Compromise**: Regular rotation limits exposure window
-  **Stale Keys**: Automated detection of keys due for rotation
-  **Rotation Failures**: Graceful degradation with error logging

**Rotation Policy**:

```toml
[kms.ssh_keys]
rotation_enabled = true
rotation_interval_days = 90   # Enterprise: 30, Dev: 180
grace_period_days = 7          # Time to update deployed keys
auto_rotate = false            # Manual approval recommended
```plaintext

### 1.3 Audit Logging

**Logged Events**:

- SSH key creation (who, when, purpose)
- SSH key retrieval (who accessed, when)
- SSH key rotation (old key ID, new key ID)
- SSH key deletion (who deleted, when)
- Failed access attempts

**Audit Entry Structure**:

```rust
pub struct SshKeyAuditEntry {
    pub timestamp: DateTime<Utc>,
    pub key_id: String,
    pub action: SshKeyAction,
    pub user: Option<String>,      // User who performed action
    pub ip_address: Option<String>, // Source IP
    pub success: bool,
    pub error_message: Option<String>,
}
```plaintext

**Threat Mitigation**:

-  **Unauthorized Access**: Full audit trail for forensics
-  **Insider Threats**: User attribution for all actions
-  **Compliance**: GDPR/SOC2 audit log requirements met

**Audit Log Retention**:

- In-memory: Last 10,000 entries
- Persistent: SurrealDB with 1-year retention
- Compliance mode: 7-year retention (configurable)

### 1.4 Key Fingerprinting

**Implementation**:

```rust
fn calculate_fingerprint(public_key: &[u8]) -> Result<String, KmsError> {
    use sha2::{Sha256, Digest};
    let mut hasher = Sha256::new();
    hasher.update(public_key);
    let result = hasher.finalize();
    Ok(format!("SHA256:{}", base64::encode(&result[..16])))
}
```plaintext

**Security Benefits**:

- Verify key integrity
- Detect key tampering
- Match deployed keys to KMS records

## 2. RBAC Security

### 2.1 Execution Modes

**Security Model by Mode**:

| Mode | Security Level | Use Case | Audit Required |
|------|---------------|----------|----------------|
| Solo | Low | Single developer | No |
| MultiUser | Medium | Small teams | Optional |
| CICD | Medium | Automation | Yes |
| Enterprise | High | Production | Mandatory |

**Mode-Specific Security**:

#### Solo Mode

```rust
// Solo mode: All users are admin
// Security: Trust-based, no RBAC checks
if mode == ExecutionMode::Solo {
    return true;  // Allow all operations
}
```plaintext

**Risks**:

- No access control
- No audit trail
- Single point of failure

**Mitigations**:

- Only for development environments
- Network isolation required
- Regular backups

#### MultiUser Mode

```rust
// Multi-user: Role-based access control
let permissions = rbac_manager.get_user_permissions(&user).await;
if !permissions.contains(&required_permission) {
    return Err(RbacError::PermissionDenied);
}
```plaintext

**Security Features**:

- Role-based permissions
- Optional audit logging
- Session management

#### CICD Mode

```rust
// CICD: Service account focused
// All actions logged for automation tracking
if mode == ExecutionMode::CICD {
    audit_log.log_automation_action(service_account, action).await;
}
```plaintext

**Security Features**:

- Service account isolation
- Mandatory audit logging
- Token-based authentication
- Short-lived credentials

#### Enterprise Mode

```rust
// Enterprise: Full security
// - Mandatory audit logging
// - Stricter session timeouts
// - Compliance reports
if mode == ExecutionMode::Enterprise {
    audit_log.log_with_compliance(user, action, compliance_tags).await;
}
```plaintext

**Security Features**:

- Full RBAC enforcement
- Comprehensive audit logging
- Compliance reporting
- Role assignment approval workflow

### 2.2 Permission System

**Permission Levels**:

```rust
Role::Admin         => 100  // Full access
Role::Operator      =>  80  // Deploy & manage
Role::Developer     =>  60  // Read + dev deploy
Role::ServiceAccount =>  50  // Automation
Role::Auditor       =>  40  // Read + audit
Role::Viewer        =>  20  // Read-only
```plaintext

**Action Security Levels**:

```rust
Action::Delete      => 100  // Destructive, admin only
Action::Manage      =>  80  // Service management
Action::Deploy      =>  80  // Deploy to production
Action::Create      =>  60  // Create resources
Action::Update      =>  60  // Modify resources
Action::Execute     =>  50  // Execute operations
Action::Audit       =>  40  // View audit logs
Action::Read        =>  20  // View resources
```plaintext

**Permission Check**:

```rust
pub fn can_perform(&self, required_level: u8) -> bool {
    self.permission_level() >= required_level
}
```plaintext

**Security Guarantees**:

-  Least privilege by default (Viewer role)
-  Hierarchical permissions (higher roles include lower)
-  Explicit deny for unknown resources
-  No permission escalation without admin

### 2.3 Session Security

**Session Configuration**:

```toml
[security]
session_timeout_minutes = 60     # Solo/MultiUser
session_timeout_minutes = 30     # Enterprise
max_sessions_per_user = 5
failed_login_lockout_attempts = 5
failed_login_lockout_duration_minutes = 15
```plaintext

**Session Lifecycle**:

1. User authenticates  JWT token issued
2. Token includes: user_id, role, issued_at, expires_at
3. Middleware validates token on each request
4. Session tracked in Redis/RocksDB
5. Session invalidated on logout or timeout

**Security Features**:

- JWT with RSA-2048 signatures
- Refresh token rotation
- Session fixation prevention
- Concurrent session limits

**Threat Mitigation**:

-  **Session Hijacking**: Short-lived tokens (1 hour)
-  **Token Replay**: One-time refresh tokens
-  **Brute Force**: Account lockout after 5 failures
-  **Session Fixation**: New session ID on login

### 2.4 Middleware Security

**RBAC Middleware Flow**:

```plaintext
Request  Auth Middleware  RBAC Middleware  Handler
                                
        Extract User      Check Permission
        from JWT Token    (role + resource + action)
                               
                         Allow / Deny
```plaintext

**Middleware Implementation**:

```rust
pub async fn check_permission(
    State(state): State<Arc<RbacMiddleware>>,
    resource: Resource,
    action: Action,
    mut req: Request,
    next: Next,
) -> Result<Response, RbacError> {
    let user = req.extensions()
        .get::<User>()
        .ok_or(RbacError::UserNotFound("No user in request".to_string()))?;

    if !state.rbac_manager.check_permission(&user, resource, action).await {
        return Err(RbacError::PermissionDenied);
    }

    Ok(next.run(req).await)
}
```plaintext

**Security Guarantees**:

-  All API endpoints protected by default
-  Permission checked before handler execution
-  User context available in handlers
-  Failed checks logged for audit

## 3. Platform Monitoring Security

### 3.1 Service Access Security

**Internal URLs Only**:

```toml
[platform]
orchestrator_url = "http://localhost:9090"       # Not exposed externally
coredns_url = "http://localhost:9153"
gitea_url = "http://localhost:3000"
oci_registry_url = "http://localhost:5000"
```plaintext

**Network Security**:

- All services on localhost or internal network
- No external exposure of monitoring endpoints
- Firewall rules to prevent external access

**Threat Mitigation**:

-  **External Scanning**: Services not reachable from internet
-  **DDoS**: Internal-only access limits attack surface
-  **Data Exfiltration**: Monitoring data not exposed externally

### 3.2 Health Check Security

**Timeout Protection**:

```rust
let client = Client::builder()
    .timeout(std::time::Duration::from_secs(5))  // Prevent hanging
    .build()
    .unwrap();
```plaintext

**Error Handling**:

```rust
// Never expose internal errors to users
Err(e) => {
    // Log detailed error internally
    tracing::error!("Health check failed for {}: {}", service, e);

    // Return generic error externally
    ServiceStatus {
        status: HealthStatus::Unhealthy,
        error_message: Some("Service unavailable".to_string()),  // Generic
        ..
    }
}
```plaintext

**Threat Mitigation**:

-  **Timeout Attacks**: 5-second timeout prevents resource exhaustion
-  **Information Disclosure**: Error messages sanitized
-  **Resource Exhaustion**: Parallel checks with concurrency limits

### 3.3 Service Control Security

**RBAC-Protected Service Control**:

```rust
// Only Operator or Admin can start/stop services
#[axum::debug_handler]
pub async fn start_service(
    State(state): State<AppState>,
    Extension(user): Extension<User>,
    Path(service_type): Path<String>,
) -> Result<StatusCode, ApiError> {
    // Check permission
    if !rbac_manager.check_permission(
        &user,
        Resource::Service,
        Action::Manage,
    ).await {
        return Err(ApiError::PermissionDenied);
    }

    // Start service
    service_manager.start_service(&service_type).await?;

    // Audit log
    audit_log.log_service_action(user, service_type, "start").await;

    Ok(StatusCode::OK)
}
```plaintext

**Security Guarantees**:

-  Only authorized users can control services
-  All service actions logged
-  Graceful degradation on service failure

## 4. Threat Model

### 4.1 High-Risk Threats

#### Threat: SSH Private Key Exposure

**Attack Vector**: Attacker gains access to KMS database

**Mitigations**:

- Private keys encrypted at rest with master key
- Master key stored in hardware security module (HSM) or KMS
- Key access audited and rate-limited
- Zeroization of decrypted keys in memory

**Detection**:

- Audit log monitoring for unusual key access patterns
- Alerting on bulk key retrievals

#### Threat: Privilege Escalation

**Attack Vector**: Lower-privileged user attempts to gain admin access

**Mitigations**:

- Role assignment requires Admin role
- Mode switching requires Admin role
- Middleware enforces permissions on every request
- No client-side permission checks (server-side only)

**Detection**:

- Failed permission checks logged
- Alerting on repeated permission denials

#### Threat: Session Hijacking

**Attack Vector**: Attacker steals JWT token

**Mitigations**:

- Short-lived access tokens (1 hour)
- Refresh token rotation
- Secure HTTP-only cookies (recommended)
- IP address binding (optional)

**Detection**:

- Unusual login locations
- Concurrent sessions from different IPs

### 4.2 Medium-Risk Threats

#### Threat: Service Impersonation

**Attack Vector**: Malicious service pretends to be legitimate platform service

**Mitigations**:

- Service URLs configured in config file (not dynamic)
- TLS certificate validation (if HTTPS)
- Service authentication tokens

**Detection**:

- Health check failures
- Metrics anomalies

#### Threat: Audit Log Tampering

**Attack Vector**: Attacker modifies audit logs to hide tracks

**Mitigations**:

- Audit logs write-only
- Logs stored in tamper-evident database (SurrealDB)
- Hash chain for log integrity
- Offsite log backup

**Detection**:

- Hash chain verification
- Log gap detection

### 4.3 Low-Risk Threats

#### Threat: Information Disclosure via Error Messages

**Attack Vector**: Error messages leak internal information

**Mitigations**:

- Generic error messages for users
- Detailed errors only in server logs
- Error message sanitization

**Detection**:

- Code review for error handling
- Automated scanning for sensitive data in responses

## 5. Compliance Considerations

### 5.1 GDPR Compliance

**Personal Data Handling**:

- User information: username, email, IP addresses
- Retention: Audit logs kept for required period
- Right to erasure: User deletion deletes all associated data

**Implementation**:

```rust
// Delete user and all associated data
pub async fn delete_user(&self, user_id: &str) -> Result<(), RbacError> {
    // Delete user SSH keys
    for key in self.list_user_ssh_keys(user_id).await? {
        self.delete_ssh_key(&key.key_id).await?;
    }

    // Anonymize audit logs (retain for compliance, remove PII)
    self.anonymize_user_audit_logs(user_id).await?;

    // Delete user record
    self.delete_user_record(user_id).await?;

    Ok(())
}
```plaintext

### 5.2 SOC 2 Compliance

**Security Controls**:

-  Access control (RBAC)
-  Audit logging (all actions logged)
-  Encryption at rest (KMS)
-  Encryption in transit (HTTPS recommended)
-  Session management (timeout, MFA support)

**Monitoring & Alerting**:

-  Service health monitoring
-  Failed login tracking
-  Permission denial alerting
-  Unusual activity detection

### 5.3 PCI DSS (if applicable)

**Requirements**:

-  Encrypt cardholder data (use KMS for keys)
-  Maintain access control (RBAC)
-  Track and monitor access (audit logs)
-  Regularly test security (integration tests)

## 6. Security Best Practices

### 6.1 Development

**Code Review Checklist**:

- [ ] All API endpoints have RBAC middleware
- [ ] No hardcoded secrets or keys
- [ ] Error messages don't leak sensitive info
- [ ] Audit logging for sensitive operations
- [ ] Input validation on all user inputs
- [ ] SQL injection prevention (use parameterized queries)
- [ ] XSS prevention (escape user inputs)

**Testing**:

- Unit tests for permission checks
- Integration tests for RBAC enforcement
- Penetration testing for production deployments

### 6.2 Deployment

**Production Checklist**:

- [ ] Change default admin password
- [ ] Enable HTTPS with valid certificate
- [ ] Configure firewall rules (internal services only)
- [ ] Set appropriate execution mode (Enterprise for production)
- [ ] Enable audit logging
- [ ] Configure session timeout (30 minutes for Enterprise)
- [ ] Enable rate limiting
- [ ] Set up log monitoring and alerting
- [ ] Regular security updates
- [ ] Backup encryption keys

### 6.3 Operations

**Incident Response**:

1. **Detection**: Monitor audit logs for anomalies
2. **Containment**: Revoke compromised credentials
3. **Eradication**: Rotate affected SSH keys
4. **Recovery**: Restore from backup if needed
5. **Lessons Learned**: Update security controls

**Key Rotation Schedule**:

- SSH keys: Every 90 days (Enterprise: 30 days)
- JWT signing keys: Every 180 days
- Master encryption key: Every 365 days
- Service account tokens: Every 30 days

## 7. Security Metrics

### 7.1 Monitoring Metrics

**Authentication**:

- Failed login attempts per user
- Concurrent sessions per user
- Session duration (average, p95, p99)

**Authorization**:

- Permission denials per user
- Permission denials per resource
- Role assignments per day

**Audit**:

- SSH key accesses per day
- SSH key rotations per month
- Audit log retention compliance

**Services**:

- Service health check success rate
- Service response times (p50, p95, p99)
- Service dependency failures

### 7.2 Alerting Thresholds

**Critical Alerts**:

- Service health: >3 failures in 5 minutes
- Failed logins: >10 attempts in 1 minute
- Permission denials: >50 in 1 minute
- SSH key bulk retrieval: >10 keys in 1 minute

**Warning Alerts**:

- Service degraded: response time >1 second
- Session timeout rate: >10% of sessions
- Audit log storage: >80% capacity

## 8. Security Roadmap

### Phase 1 (Completed)

-  SSH key storage with encryption
-  Mode-based RBAC
-  Audit logging
-  Platform monitoring

### Phase 2 (In Progress)

- 📋 API handlers with RBAC enforcement
- 📋 Integration tests for security
- 📋 Documentation

### Phase 3 (Future)

- Multi-factor authentication (MFA)
- Hardware security module (HSM) integration
- Advanced threat detection (ML-based)
- Automated security scanning
- Compliance report generation
- Security information and event management (SIEM) integration

## References

- **OWASP Top 10**: <https://owasp.org/www-project-top-ten/>
- **NIST Cybersecurity Framework**: <https://www.nist.gov/cyberframework>
- **CIS Controls**: <https://www.cisecurity.org/controls>
- **GDPR**: <https://gdpr.eu/>
- **SOC 2**: <https://www.aicpa.org/interestareas/frc/assuranceadvisoryservices/socforserviceorganizations.html>

---

**Last Updated**: 2025-10-06
**Review Cycle**: Quarterly
**Next Review**: 2026-01-06