prvng_platform/control-center/docs/SECURITY_CONSIDERATIONS.md
2025-10-07 10:59:52 +01:00

17 KiB

Security Considerations for Control Center Enhancements

Overview

This document outlines the security architecture and considerations for the control-center enhancements, including KMS SSH key management, mode-based RBAC, and platform service monitoring.

1. SSH Key Management Security

1.1 Key Storage Security

Implementation:

  • Private keys encrypted at rest using AES-256-GCM in KMS
  • Public keys stored in plaintext (as they are meant to be public)
  • Private key material never exposed in API responses
  • Key IDs used as references, not actual keys

Threat Mitigation:

  • Data at Rest: All private keys encrypted with master encryption key
  • Key Exposure: Private keys only decrypted in memory when needed
  • Key Leakage: Zeroization of key material after use
  • Unauthorized Access: KMS access controlled by RBAC

Best Practices:

// Good: Using key ID reference
let key_id = ssh_key_manager.store_ssh_key(name, private, public, purpose, tags).await?;

// Bad: Never do this - exposing private key in logs
tracing::info!("Stored key: {}", private_key);  // DON'T DO THIS

1.2 Key Rotation Security

Implementation:

  • Configurable rotation intervals (default 90 days)
  • Grace period for old key usage (default 7 days)
  • Automatic rotation scheduling (if enabled)
  • Manual rotation support with immediate effect

Threat Mitigation:

  • Key Compromise: Regular rotation limits exposure window
  • Stale Keys: Automated detection of keys due for rotation
  • Rotation Failures: Graceful degradation with error logging

Rotation Policy:

[kms.ssh_keys]
rotation_enabled = true
rotation_interval_days = 90   # Enterprise: 30, Dev: 180
grace_period_days = 7          # Time to update deployed keys
auto_rotate = false            # Manual approval recommended

1.3 Audit Logging

Logged Events:

  • SSH key creation (who, when, purpose)
  • SSH key retrieval (who accessed, when)
  • SSH key rotation (old key ID, new key ID)
  • SSH key deletion (who deleted, when)
  • Failed access attempts

Audit Entry Structure:

pub struct SshKeyAuditEntry {
    pub timestamp: DateTime<Utc>,
    pub key_id: String,
    pub action: SshKeyAction,
    pub user: Option<String>,      // User who performed action
    pub ip_address: Option<String>, // Source IP
    pub success: bool,
    pub error_message: Option<String>,
}

Threat Mitigation:

  • Unauthorized Access: Full audit trail for forensics
  • Insider Threats: User attribution for all actions
  • Compliance: GDPR/SOC2 audit log requirements met

Audit Log Retention:

  • In-memory: Last 10,000 entries
  • Persistent: SurrealDB with 1-year retention
  • Compliance mode: 7-year retention (configurable)

1.4 Key Fingerprinting

Implementation:

fn calculate_fingerprint(public_key: &[u8]) -> Result<String, KmsError> {
    use sha2::{Sha256, Digest};
    let mut hasher = Sha256::new();
    hasher.update(public_key);
    let result = hasher.finalize();
    Ok(format!("SHA256:{}", base64::encode(&result[..16])))
}

Security Benefits:

  • Verify key integrity
  • Detect key tampering
  • Match deployed keys to KMS records

2. RBAC Security

2.1 Execution Modes

Security Model by Mode:

Mode Security Level Use Case Audit Required
Solo Low Single developer No
MultiUser Medium Small teams Optional
CICD Medium Automation Yes
Enterprise High Production Mandatory

Mode-Specific Security:

Solo Mode

// Solo mode: All users are admin
// Security: Trust-based, no RBAC checks
if mode == ExecutionMode::Solo {
    return true;  // Allow all operations
}

Risks:

  • No access control
  • No audit trail
  • Single point of failure

Mitigations:

  • Only for development environments
  • Network isolation required
  • Regular backups

MultiUser Mode

// Multi-user: Role-based access control
let permissions = rbac_manager.get_user_permissions(&user).await;
if !permissions.contains(&required_permission) {
    return Err(RbacError::PermissionDenied);
}

Security Features:

  • Role-based permissions
  • Optional audit logging
  • Session management

CICD Mode

// CICD: Service account focused
// All actions logged for automation tracking
if mode == ExecutionMode::CICD {
    audit_log.log_automation_action(service_account, action).await;
}

Security Features:

  • Service account isolation
  • Mandatory audit logging
  • Token-based authentication
  • Short-lived credentials

Enterprise Mode

// Enterprise: Full security
// - Mandatory audit logging
// - Stricter session timeouts
// - Compliance reports
if mode == ExecutionMode::Enterprise {
    audit_log.log_with_compliance(user, action, compliance_tags).await;
}

Security Features:

  • Full RBAC enforcement
  • Comprehensive audit logging
  • Compliance reporting
  • Role assignment approval workflow

2.2 Permission System

Permission Levels:

Role::Admin         => 100  // Full access
Role::Operator      =>  80  // Deploy & manage
Role::Developer     =>  60  // Read + dev deploy
Role::ServiceAccount =>  50  // Automation
Role::Auditor       =>  40  // Read + audit
Role::Viewer        =>  20  // Read-only

Action Security Levels:

Action::Delete      => 100  // Destructive, admin only
Action::Manage      =>  80  // Service management
Action::Deploy      =>  80  // Deploy to production
Action::Create      =>  60  // Create resources
Action::Update      =>  60  // Modify resources
Action::Execute     =>  50  // Execute operations
Action::Audit       =>  40  // View audit logs
Action::Read        =>  20  // View resources

Permission Check:

pub fn can_perform(&self, required_level: u8) -> bool {
    self.permission_level() >= required_level
}

Security Guarantees:

  • Least privilege by default (Viewer role)
  • Hierarchical permissions (higher roles include lower)
  • Explicit deny for unknown resources
  • No permission escalation without admin

2.3 Session Security

Session Configuration:

[security]
session_timeout_minutes = 60     # Solo/MultiUser
session_timeout_minutes = 30     # Enterprise
max_sessions_per_user = 5
failed_login_lockout_attempts = 5
failed_login_lockout_duration_minutes = 15

Session Lifecycle:

  1. User authenticates → JWT token issued
  2. Token includes: user_id, role, issued_at, expires_at
  3. Middleware validates token on each request
  4. Session tracked in Redis/RocksDB
  5. Session invalidated on logout or timeout

Security Features:

  • JWT with RSA-2048 signatures
  • Refresh token rotation
  • Session fixation prevention
  • Concurrent session limits

Threat Mitigation:

  • Session Hijacking: Short-lived tokens (1 hour)
  • Token Replay: One-time refresh tokens
  • Brute Force: Account lockout after 5 failures
  • Session Fixation: New session ID on login

2.4 Middleware Security

RBAC Middleware Flow:

Request → Auth Middleware → RBAC Middleware → Handler
            ↓                    ↓
        Extract User      Check Permission
        from JWT Token    (role + resource + action)
                               ↓
                         Allow / Deny

Middleware Implementation:

pub async fn check_permission(
    State(state): State<Arc<RbacMiddleware>>,
    resource: Resource,
    action: Action,
    mut req: Request,
    next: Next,
) -> Result<Response, RbacError> {
    let user = req.extensions()
        .get::<User>()
        .ok_or(RbacError::UserNotFound("No user in request".to_string()))?;

    if !state.rbac_manager.check_permission(&user, resource, action).await {
        return Err(RbacError::PermissionDenied);
    }

    Ok(next.run(req).await)
}

Security Guarantees:

  • All API endpoints protected by default
  • Permission checked before handler execution
  • User context available in handlers
  • Failed checks logged for audit

3. Platform Monitoring Security

3.1 Service Access Security

Internal URLs Only:

[platform]
orchestrator_url = "http://localhost:8080"       # Not exposed externally
coredns_url = "http://localhost:9153"
gitea_url = "http://localhost:3000"
oci_registry_url = "http://localhost:5000"

Network Security:

  • All services on localhost or internal network
  • No external exposure of monitoring endpoints
  • Firewall rules to prevent external access

Threat Mitigation:

  • External Scanning: Services not reachable from internet
  • DDoS: Internal-only access limits attack surface
  • Data Exfiltration: Monitoring data not exposed externally

3.2 Health Check Security

Timeout Protection:

let client = Client::builder()
    .timeout(std::time::Duration::from_secs(5))  // Prevent hanging
    .build()
    .unwrap();

Error Handling:

// Never expose internal errors to users
Err(e) => {
    // Log detailed error internally
    tracing::error!("Health check failed for {}: {}", service, e);

    // Return generic error externally
    ServiceStatus {
        status: HealthStatus::Unhealthy,
        error_message: Some("Service unavailable".to_string()),  // Generic
        ..
    }
}

Threat Mitigation:

  • Timeout Attacks: 5-second timeout prevents resource exhaustion
  • Information Disclosure: Error messages sanitized
  • Resource Exhaustion: Parallel checks with concurrency limits

3.3 Service Control Security

RBAC-Protected Service Control:

// Only Operator or Admin can start/stop services
#[axum::debug_handler]
pub async fn start_service(
    State(state): State<AppState>,
    Extension(user): Extension<User>,
    Path(service_type): Path<String>,
) -> Result<StatusCode, ApiError> {
    // Check permission
    if !rbac_manager.check_permission(
        &user,
        Resource::Service,
        Action::Manage,
    ).await {
        return Err(ApiError::PermissionDenied);
    }

    // Start service
    service_manager.start_service(&service_type).await?;

    // Audit log
    audit_log.log_service_action(user, service_type, "start").await;

    Ok(StatusCode::OK)
}

Security Guarantees:

  • Only authorized users can control services
  • All service actions logged
  • Graceful degradation on service failure

4. Threat Model

4.1 High-Risk Threats

Threat: SSH Private Key Exposure

Attack Vector: Attacker gains access to KMS database

Mitigations:

  • Private keys encrypted at rest with master key
  • Master key stored in hardware security module (HSM) or KMS
  • Key access audited and rate-limited
  • Zeroization of decrypted keys in memory

Detection:

  • Audit log monitoring for unusual key access patterns
  • Alerting on bulk key retrievals

Threat: Privilege Escalation

Attack Vector: Lower-privileged user attempts to gain admin access

Mitigations:

  • Role assignment requires Admin role
  • Mode switching requires Admin role
  • Middleware enforces permissions on every request
  • No client-side permission checks (server-side only)

Detection:

  • Failed permission checks logged
  • Alerting on repeated permission denials

Threat: Session Hijacking

Attack Vector: Attacker steals JWT token

Mitigations:

  • Short-lived access tokens (1 hour)
  • Refresh token rotation
  • Secure HTTP-only cookies (recommended)
  • IP address binding (optional)

Detection:

  • Unusual login locations
  • Concurrent sessions from different IPs

4.2 Medium-Risk Threats

Threat: Service Impersonation

Attack Vector: Malicious service pretends to be legitimate platform service

Mitigations:

  • Service URLs configured in config file (not dynamic)
  • TLS certificate validation (if HTTPS)
  • Service authentication tokens

Detection:

  • Health check failures
  • Metrics anomalies

Threat: Audit Log Tampering

Attack Vector: Attacker modifies audit logs to hide tracks

Mitigations:

  • Audit logs write-only
  • Logs stored in tamper-evident database (SurrealDB)
  • Hash chain for log integrity
  • Offsite log backup

Detection:

  • Hash chain verification
  • Log gap detection

4.3 Low-Risk Threats

Threat: Information Disclosure via Error Messages

Attack Vector: Error messages leak internal information

Mitigations:

  • Generic error messages for users
  • Detailed errors only in server logs
  • Error message sanitization

Detection:

  • Code review for error handling
  • Automated scanning for sensitive data in responses

5. Compliance Considerations

5.1 GDPR Compliance

Personal Data Handling:

  • User information: username, email, IP addresses
  • Retention: Audit logs kept for required period
  • Right to erasure: User deletion deletes all associated data

Implementation:

// Delete user and all associated data
pub async fn delete_user(&self, user_id: &str) -> Result<(), RbacError> {
    // Delete user SSH keys
    for key in self.list_user_ssh_keys(user_id).await? {
        self.delete_ssh_key(&key.key_id).await?;
    }

    // Anonymize audit logs (retain for compliance, remove PII)
    self.anonymize_user_audit_logs(user_id).await?;

    // Delete user record
    self.delete_user_record(user_id).await?;

    Ok(())
}

5.2 SOC 2 Compliance

Security Controls:

  • Access control (RBAC)
  • Audit logging (all actions logged)
  • Encryption at rest (KMS)
  • Encryption in transit (HTTPS recommended)
  • Session management (timeout, MFA support)

Monitoring & Alerting:

  • Service health monitoring
  • Failed login tracking
  • Permission denial alerting
  • Unusual activity detection

5.3 PCI DSS (if applicable)

Requirements:

  • Encrypt cardholder data (use KMS for keys)
  • Maintain access control (RBAC)
  • Track and monitor access (audit logs)
  • Regularly test security (integration tests)

6. Security Best Practices

6.1 Development

Code Review Checklist:

  • All API endpoints have RBAC middleware
  • No hardcoded secrets or keys
  • Error messages don't leak sensitive info
  • Audit logging for sensitive operations
  • Input validation on all user inputs
  • SQL injection prevention (use parameterized queries)
  • XSS prevention (escape user inputs)

Testing:

  • Unit tests for permission checks
  • Integration tests for RBAC enforcement
  • Penetration testing for production deployments

6.2 Deployment

Production Checklist:

  • Change default admin password
  • Enable HTTPS with valid certificate
  • Configure firewall rules (internal services only)
  • Set appropriate execution mode (Enterprise for production)
  • Enable audit logging
  • Configure session timeout (30 minutes for Enterprise)
  • Enable rate limiting
  • Set up log monitoring and alerting
  • Regular security updates
  • Backup encryption keys

6.3 Operations

Incident Response:

  1. Detection: Monitor audit logs for anomalies
  2. Containment: Revoke compromised credentials
  3. Eradication: Rotate affected SSH keys
  4. Recovery: Restore from backup if needed
  5. Lessons Learned: Update security controls

Key Rotation Schedule:

  • SSH keys: Every 90 days (Enterprise: 30 days)
  • JWT signing keys: Every 180 days
  • Master encryption key: Every 365 days
  • Service account tokens: Every 30 days

7. Security Metrics

7.1 Monitoring Metrics

Authentication:

  • Failed login attempts per user
  • Concurrent sessions per user
  • Session duration (average, p95, p99)

Authorization:

  • Permission denials per user
  • Permission denials per resource
  • Role assignments per day

Audit:

  • SSH key accesses per day
  • SSH key rotations per month
  • Audit log retention compliance

Services:

  • Service health check success rate
  • Service response times (p50, p95, p99)
  • Service dependency failures

7.2 Alerting Thresholds

Critical Alerts:

  • Service health: >3 failures in 5 minutes
  • Failed logins: >10 attempts in 1 minute
  • Permission denials: >50 in 1 minute
  • SSH key bulk retrieval: >10 keys in 1 minute

Warning Alerts:

  • Service degraded: response time >1 second
  • Session timeout rate: >10% of sessions
  • Audit log storage: >80% capacity

8. Security Roadmap

Phase 1 (Completed)

  • SSH key storage with encryption
  • Mode-based RBAC
  • Audit logging
  • Platform monitoring

Phase 2 (In Progress)

  • 📋 API handlers with RBAC enforcement
  • 📋 Integration tests for security
  • 📋 Documentation

Phase 3 (Future)

  • Multi-factor authentication (MFA)
  • Hardware security module (HSM) integration
  • Advanced threat detection (ML-based)
  • Automated security scanning
  • Compliance report generation
  • Security information and event management (SIEM) integration

References


Last Updated: 2025-10-06 Review Cycle: Quarterly Next Review: 2026-01-06