Jesús Pérez f2be2414e4 core: init repo and codebase

2025-10-07 10:59:52 +01:00

17 KiB

Raw Permalink Blame History

Security Considerations for Control Center Enhancements

Overview

This document outlines the security architecture and considerations for the control-center enhancements, including KMS SSH key management, mode-based RBAC, and platform service monitoring.

1. SSH Key Management Security

1.1 Key Storage Security

Implementation:

Private keys encrypted at rest using AES-256-GCM in KMS
Public keys stored in plaintext (as they are meant to be public)
Private key material never exposed in API responses
Key IDs used as references, not actual keys

Threat Mitigation:

✅ Data at Rest: All private keys encrypted with master encryption key
✅ Key Exposure: Private keys only decrypted in memory when needed
✅ Key Leakage: Zeroization of key material after use
✅ Unauthorized Access: KMS access controlled by RBAC

Best Practices:

// Good: Using key ID reference
let key_id = ssh_key_manager.store_ssh_key(name, private, public, purpose, tags).await?;

// Bad: Never do this - exposing private key in logs
tracing::info!("Stored key: {}", private_key);  // DON'T DO THIS

1.2 Key Rotation Security

Implementation:

Configurable rotation intervals (default 90 days)
Grace period for old key usage (default 7 days)
Automatic rotation scheduling (if enabled)
Manual rotation support with immediate effect

Threat Mitigation:

✅ Key Compromise: Regular rotation limits exposure window
✅ Stale Keys: Automated detection of keys due for rotation
✅ Rotation Failures: Graceful degradation with error logging

Rotation Policy:

[kms.ssh_keys]
rotation_enabled = true
rotation_interval_days = 90   # Enterprise: 30, Dev: 180
grace_period_days = 7          # Time to update deployed keys
auto_rotate = false            # Manual approval recommended

1.3 Audit Logging

Logged Events:

SSH key creation (who, when, purpose)
SSH key retrieval (who accessed, when)
SSH key rotation (old key ID, new key ID)
SSH key deletion (who deleted, when)
Failed access attempts

Audit Entry Structure:

pub struct SshKeyAuditEntry {
    pub timestamp: DateTime<Utc>,
    pub key_id: String,
    pub action: SshKeyAction,
    pub user: Option<String>,      // User who performed action
    pub ip_address: Option<String>, // Source IP
    pub success: bool,
    pub error_message: Option<String>,
}

Threat Mitigation:

✅ Unauthorized Access: Full audit trail for forensics
✅ Insider Threats: User attribution for all actions
✅ Compliance: GDPR/SOC2 audit log requirements met

Audit Log Retention:

In-memory: Last 10,000 entries
Persistent: SurrealDB with 1-year retention
Compliance mode: 7-year retention (configurable)

1.4 Key Fingerprinting

Implementation:

fn calculate_fingerprint(public_key: &[u8]) -> Result<String, KmsError> {
    use sha2::{Sha256, Digest};
    let mut hasher = Sha256::new();
    hasher.update(public_key);
    let result = hasher.finalize();
    Ok(format!("SHA256:{}", base64::encode(&result[..16])))
}

Security Benefits:

Verify key integrity
Detect key tampering
Match deployed keys to KMS records

2. RBAC Security

2.1 Execution Modes

Security Model by Mode:

Mode	Security Level	Use Case	Audit Required
Solo	Low	Single developer	No
MultiUser	Medium	Small teams	Optional
CICD	Medium	Automation	Yes
Enterprise	High	Production	Mandatory

Mode-Specific Security:

Solo Mode

// Solo mode: All users are admin
// Security: Trust-based, no RBAC checks
if mode == ExecutionMode::Solo {
    return true;  // Allow all operations
}

Risks:

No access control
No audit trail
Single point of failure

Mitigations:

Only for development environments
Network isolation required
Regular backups

MultiUser Mode

// Multi-user: Role-based access control
let permissions = rbac_manager.get_user_permissions(&user).await;
if !permissions.contains(&required_permission) {
    return Err(RbacError::PermissionDenied);
}

Security Features:

Role-based permissions
Optional audit logging
Session management

CICD Mode

// CICD: Service account focused
// All actions logged for automation tracking
if mode == ExecutionMode::CICD {
    audit_log.log_automation_action(service_account, action).await;
}

Security Features:

Service account isolation
Mandatory audit logging
Token-based authentication
Short-lived credentials

Enterprise Mode

// Enterprise: Full security
// - Mandatory audit logging
// - Stricter session timeouts
// - Compliance reports
if mode == ExecutionMode::Enterprise {
    audit_log.log_with_compliance(user, action, compliance_tags).await;
}

Security Features:

Full RBAC enforcement
Comprehensive audit logging
Compliance reporting
Role assignment approval workflow

2.2 Permission System

Permission Levels:

Role::Admin         => 100  // Full access
Role::Operator      =>  80  // Deploy & manage
Role::Developer     =>  60  // Read + dev deploy
Role::ServiceAccount =>  50  // Automation
Role::Auditor       =>  40  // Read + audit
Role::Viewer        =>  20  // Read-only

Action Security Levels:

Action::Delete      => 100  // Destructive, admin only
Action::Manage      =>  80  // Service management
Action::Deploy      =>  80  // Deploy to production
Action::Create      =>  60  // Create resources
Action::Update      =>  60  // Modify resources
Action::Execute     =>  50  // Execute operations
Action::Audit       =>  40  // View audit logs
Action::Read        =>  20  // View resources

Permission Check:

pub fn can_perform(&self, required_level: u8) -> bool {
    self.permission_level() >= required_level
}

Security Guarantees:

✅ Least privilege by default (Viewer role)
✅ Hierarchical permissions (higher roles include lower)
✅ Explicit deny for unknown resources
✅ No permission escalation without admin

2.3 Session Security

Session Configuration:

[security]
session_timeout_minutes = 60     # Solo/MultiUser
session_timeout_minutes = 30     # Enterprise
max_sessions_per_user = 5
failed_login_lockout_attempts = 5
failed_login_lockout_duration_minutes = 15

Session Lifecycle:

User authenticates → JWT token issued
Token includes: user_id, role, issued_at, expires_at
Middleware validates token on each request
Session tracked in Redis/RocksDB
Session invalidated on logout or timeout

Security Features:

JWT with RSA-2048 signatures
Refresh token rotation
Session fixation prevention
Concurrent session limits

Threat Mitigation:

✅ Session Hijacking: Short-lived tokens (1 hour)
✅ Token Replay: One-time refresh tokens
✅ Brute Force: Account lockout after 5 failures
✅ Session Fixation: New session ID on login

2.4 Middleware Security

RBAC Middleware Flow:

Request → Auth Middleware → RBAC Middleware → Handler
            ↓                    ↓
        Extract User      Check Permission
        from JWT Token    (role + resource + action)
                               ↓
                         Allow / Deny

Middleware Implementation:

pub async fn check_permission(
    State(state): State<Arc<RbacMiddleware>>,
    resource: Resource,
    action: Action,
    mut req: Request,
    next: Next,
) -> Result<Response, RbacError> {
    let user = req.extensions()
        .get::<User>()
        .ok_or(RbacError::UserNotFound("No user in request".to_string()))?;

    if !state.rbac_manager.check_permission(&user, resource, action).await {
        return Err(RbacError::PermissionDenied);
    }

    Ok(next.run(req).await)
}

Security Guarantees:

✅ All API endpoints protected by default
✅ Permission checked before handler execution
✅ User context available in handlers
✅ Failed checks logged for audit

3. Platform Monitoring Security

3.1 Service Access Security

Internal URLs Only:

[platform]
orchestrator_url = "http://localhost:8080"       # Not exposed externally
coredns_url = "http://localhost:9153"
gitea_url = "http://localhost:3000"
oci_registry_url = "http://localhost:5000"

Network Security:

All services on localhost or internal network
No external exposure of monitoring endpoints
Firewall rules to prevent external access

Threat Mitigation:

✅ External Scanning: Services not reachable from internet
✅ DDoS: Internal-only access limits attack surface
✅ Data Exfiltration: Monitoring data not exposed externally

3.2 Health Check Security

Timeout Protection:

let client = Client::builder()
    .timeout(std::time::Duration::from_secs(5))  // Prevent hanging
    .build()
    .unwrap();

Error Handling:

// Never expose internal errors to users
Err(e) => {
    // Log detailed error internally
    tracing::error!("Health check failed for {}: {}", service, e);

    // Return generic error externally
    ServiceStatus {
        status: HealthStatus::Unhealthy,
        error_message: Some("Service unavailable".to_string()),  // Generic
        ..
    }
}

Threat Mitigation:

✅ Timeout Attacks: 5-second timeout prevents resource exhaustion
✅ Information Disclosure: Error messages sanitized
✅ Resource Exhaustion: Parallel checks with concurrency limits

3.3 Service Control Security

RBAC-Protected Service Control:

// Only Operator or Admin can start/stop services
#[axum::debug_handler]
pub async fn start_service(
    State(state): State<AppState>,
    Extension(user): Extension<User>,
    Path(service_type): Path<String>,
) -> Result<StatusCode, ApiError> {
    // Check permission
    if !rbac_manager.check_permission(
        &user,
        Resource::Service,
        Action::Manage,
    ).await {
        return Err(ApiError::PermissionDenied);
    }

    // Start service
    service_manager.start_service(&service_type).await?;

    // Audit log
    audit_log.log_service_action(user, service_type, "start").await;

    Ok(StatusCode::OK)
}

Security Guarantees:

✅ Only authorized users can control services
✅ All service actions logged
✅ Graceful degradation on service failure

4. Threat Model

4.1 High-Risk Threats

Threat: SSH Private Key Exposure

Attack Vector: Attacker gains access to KMS database

Mitigations:

Private keys encrypted at rest with master key
Master key stored in hardware security module (HSM) or KMS
Key access audited and rate-limited
Zeroization of decrypted keys in memory

Detection:

Audit log monitoring for unusual key access patterns
Alerting on bulk key retrievals

Threat: Privilege Escalation

Attack Vector: Lower-privileged user attempts to gain admin access

Mitigations:

Role assignment requires Admin role
Mode switching requires Admin role
Middleware enforces permissions on every request
No client-side permission checks (server-side only)

Detection:

Failed permission checks logged
Alerting on repeated permission denials

Threat: Session Hijacking

Attack Vector: Attacker steals JWT token

Mitigations:

Short-lived access tokens (1 hour)
Refresh token rotation
Secure HTTP-only cookies (recommended)
IP address binding (optional)

Detection:

Unusual login locations
Concurrent sessions from different IPs

4.2 Medium-Risk Threats

Threat: Service Impersonation

Attack Vector: Malicious service pretends to be legitimate platform service

Mitigations:

Service URLs configured in config file (not dynamic)
TLS certificate validation (if HTTPS)
Service authentication tokens

Detection:

Health check failures
Metrics anomalies

Threat: Audit Log Tampering

Attack Vector: Attacker modifies audit logs to hide tracks

Mitigations:

Audit logs write-only
Logs stored in tamper-evident database (SurrealDB)
Hash chain for log integrity
Offsite log backup

Detection:

Hash chain verification
Log gap detection

4.3 Low-Risk Threats

Threat: Information Disclosure via Error Messages

Attack Vector: Error messages leak internal information

Mitigations:

Generic error messages for users
Detailed errors only in server logs
Error message sanitization

Detection:

Code review for error handling
Automated scanning for sensitive data in responses

5. Compliance Considerations

Personal Data Handling:

User information: username, email, IP addresses
Retention: Audit logs kept for required period
Right to erasure: User deletion deletes all associated data

Implementation:

// Delete user and all associated data
pub async fn delete_user(&self, user_id: &str) -> Result<(), RbacError> {
    // Delete user SSH keys
    for key in self.list_user_ssh_keys(user_id).await? {
        self.delete_ssh_key(&key.key_id).await?;
    }

    // Anonymize audit logs (retain for compliance, remove PII)
    self.anonymize_user_audit_logs(user_id).await?;

    // Delete user record
    self.delete_user_record(user_id).await?;

    Ok(())
}

5.2 SOC 2 Compliance

Security Controls:

✅ Access control (RBAC)
✅ Audit logging (all actions logged)
✅ Encryption at rest (KMS)
✅ Encryption in transit (HTTPS recommended)
✅ Session management (timeout, MFA support)

Monitoring & Alerting:

✅ Service health monitoring
✅ Failed login tracking
✅ Permission denial alerting
✅ Unusual activity detection

5.3 PCI DSS (if applicable)

Requirements:

✅ Encrypt cardholder data (use KMS for keys)
✅ Maintain access control (RBAC)
✅ Track and monitor access (audit logs)
✅ Regularly test security (integration tests)

6. Security Best Practices

6.1 Development

Code Review Checklist:

All API endpoints have RBAC middleware
No hardcoded secrets or keys
Error messages don't leak sensitive info
Audit logging for sensitive operations
Input validation on all user inputs
SQL injection prevention (use parameterized queries)
XSS prevention (escape user inputs)

Testing:

Unit tests for permission checks
Integration tests for RBAC enforcement
Penetration testing for production deployments

6.2 Deployment

Production Checklist:

Change default admin password
Enable HTTPS with valid certificate
Configure firewall rules (internal services only)
Set appropriate execution mode (Enterprise for production)
Enable audit logging
Configure session timeout (30 minutes for Enterprise)
Enable rate limiting
Set up log monitoring and alerting
Regular security updates
Backup encryption keys

6.3 Operations

Incident Response:

Detection: Monitor audit logs for anomalies
Containment: Revoke compromised credentials
Eradication: Rotate affected SSH keys
Recovery: Restore from backup if needed
Lessons Learned: Update security controls

Key Rotation Schedule:

SSH keys: Every 90 days (Enterprise: 30 days)
JWT signing keys: Every 180 days
Master encryption key: Every 365 days
Service account tokens: Every 30 days

7. Security Metrics

7.1 Monitoring Metrics

Authentication:

Failed login attempts per user
Concurrent sessions per user
Session duration (average, p95, p99)

Authorization:

Permission denials per user
Permission denials per resource
Role assignments per day

Audit:

SSH key accesses per day
SSH key rotations per month
Audit log retention compliance

Services:

Service health check success rate
Service response times (p50, p95, p99)
Service dependency failures

7.2 Alerting Thresholds

Critical Alerts:

Service health: >3 failures in 5 minutes
Failed logins: >10 attempts in 1 minute
Permission denials: >50 in 1 minute
SSH key bulk retrieval: >10 keys in 1 minute

Warning Alerts:

Service degraded: response time >1 second
Session timeout rate: >10% of sessions
Audit log storage: >80% capacity

8. Security Roadmap

Phase 1 (Completed)

✅ SSH key storage with encryption
✅ Mode-based RBAC
✅ Audit logging
✅ Platform monitoring

Phase 2 (In Progress)

📋 API handlers with RBAC enforcement
📋 Integration tests for security
📋 Documentation

Phase 3 (Future)

Multi-factor authentication (MFA)
Hardware security module (HSM) integration
Advanced threat detection (ML-based)
Automated security scanning
Compliance report generation
Security information and event management (SIEM) integration

References

OWASP Top 10: https://owasp.org/www-project-top-ten/
NIST Cybersecurity Framework: https://www.nist.gov/cyberframework
CIS Controls: https://www.cisecurity.org/controls
GDPR: https://gdpr.eu/
SOC 2: https://www.aicpa.org/interestareas/frc/assuranceadvisoryservices/socforserviceorganizations.html

Last Updated: 2025-10-06 Review Cycle: Quarterly Next Review: 2026-01-06

17 KiB Raw Permalink Blame History

Security Considerations for Control Center Enhancements

Overview

1. SSH Key Management Security

1.1 Key Storage Security

1.2 Key Rotation Security

1.3 Audit Logging

1.4 Key Fingerprinting

2. RBAC Security

2.1 Execution Modes

Solo Mode

MultiUser Mode

CICD Mode

Enterprise Mode

2.2 Permission System

2.3 Session Security

2.4 Middleware Security

3. Platform Monitoring Security

3.1 Service Access Security

3.2 Health Check Security

3.3 Service Control Security

4. Threat Model

4.1 High-Risk Threats

Threat: SSH Private Key Exposure

Threat: Privilege Escalation

Threat: Session Hijacking

4.2 Medium-Risk Threats

Threat: Service Impersonation

Threat: Audit Log Tampering

4.3 Low-Risk Threats

Threat: Information Disclosure via Error Messages

5. Compliance Considerations

5.1 GDPR Compliance

5.2 SOC 2 Compliance

5.3 PCI DSS (if applicable)

6. Security Best Practices

6.1 Development

6.2 Deployment

6.3 Operations

7. Security Metrics

7.1 Monitoring Metrics

7.2 Alerting Thresholds

8. Security Roadmap

Phase 1 (Completed)

Phase 2 (In Progress)

Phase 3 (Future)

References

17 KiB

Raw Permalink Blame History