prvng_platform/control-center/docs/SECURITY_CONSIDERATIONS.md

# Security Considerations for Control Center Enhancements

## Overview

This document outlines the security architecture and considerations for the control-center enhancements, including KMS SSH key management, mode-based RBAC, and platform service monitoring.

## 1. SSH Key Management Security

### 1.1 Key Storage Security

**Implementation**:
- Private keys encrypted at rest using AES-256-GCM in KMS
- Public keys stored in plaintext (as they are meant to be public)
- Private key material never exposed in API responses
- Key IDs used as references, not actual keys

**Threat Mitigation**:
- ✅ **Data at Rest**: All private keys encrypted with master encryption key
- ✅ **Key Exposure**: Private keys only decrypted in memory when needed
- ✅ **Key Leakage**: Zeroization of key material after use
- ✅ **Unauthorized Access**: KMS access controlled by RBAC

**Best Practices**:
```rust
// Good: Using key ID reference
let key_id = ssh_key_manager.store_ssh_key(name, private, public, purpose, tags).await?;

// Bad: Never do this - exposing private key in logs
tracing::info!("Stored key: {}", private_key);  // DON'T DO THIS
```

### 1.2 Key Rotation Security

**Implementation**:
- Configurable rotation intervals (default 90 days)
- Grace period for old key usage (default 7 days)
- Automatic rotation scheduling (if enabled)
- Manual rotation support with immediate effect

**Threat Mitigation**:
- ✅ **Key Compromise**: Regular rotation limits exposure window
- ✅ **Stale Keys**: Automated detection of keys due for rotation
- ✅ **Rotation Failures**: Graceful degradation with error logging

**Rotation Policy**:
```toml
[kms.ssh_keys]
rotation_enabled = true
rotation_interval_days = 90   # Enterprise: 30, Dev: 180
grace_period_days = 7          # Time to update deployed keys
auto_rotate = false            # Manual approval recommended
```

### 1.3 Audit Logging

**Logged Events**:
- SSH key creation (who, when, purpose)
- SSH key retrieval (who accessed, when)
- SSH key rotation (old key ID, new key ID)
- SSH key deletion (who deleted, when)
- Failed access attempts

**Audit Entry Structure**:
```rust
pub struct SshKeyAuditEntry {
    pub timestamp: DateTime<Utc>,
    pub key_id: String,
    pub action: SshKeyAction,
    pub user: Option<String>,      // User who performed action
    pub ip_address: Option<String>, // Source IP
    pub success: bool,
    pub error_message: Option<String>,
}
```

**Threat Mitigation**:
- ✅ **Unauthorized Access**: Full audit trail for forensics
- ✅ **Insider Threats**: User attribution for all actions
- ✅ **Compliance**: GDPR/SOC2 audit log requirements met

**Audit Log Retention**:
- In-memory: Last 10,000 entries
- Persistent: SurrealDB with 1-year retention
- Compliance mode: 7-year retention (configurable)

### 1.4 Key Fingerprinting

**Implementation**:
```rust
fn calculate_fingerprint(public_key: &[u8]) -> Result<String, KmsError> {
    use sha2::{Sha256, Digest};
    let mut hasher = Sha256::new();
    hasher.update(public_key);
    let result = hasher.finalize();
    Ok(format!("SHA256:{}", base64::encode(&result[..16])))
}
```

**Security Benefits**:
- Verify key integrity
- Detect key tampering
- Match deployed keys to KMS records

## 2. RBAC Security

### 2.1 Execution Modes

**Security Model by Mode**:

| Mode | Security Level | Use Case | Audit Required |
|------|---------------|----------|----------------|
| Solo | Low | Single developer | No |
| MultiUser | Medium | Small teams | Optional |
| CICD | Medium | Automation | Yes |
| Enterprise | High | Production | Mandatory |

**Mode-Specific Security**:

#### Solo Mode
```rust
// Solo mode: All users are admin
// Security: Trust-based, no RBAC checks
if mode == ExecutionMode::Solo {
    return true;  // Allow all operations
}
```

**Risks**:
- No access control
- No audit trail
- Single point of failure

**Mitigations**:
- Only for development environments
- Network isolation required
- Regular backups

#### MultiUser Mode
```rust
// Multi-user: Role-based access control
let permissions = rbac_manager.get_user_permissions(&user).await;
if !permissions.contains(&required_permission) {
    return Err(RbacError::PermissionDenied);
}
```

**Security Features**:
- Role-based permissions
- Optional audit logging
- Session management

#### CICD Mode
```rust
// CICD: Service account focused
// All actions logged for automation tracking
if mode == ExecutionMode::CICD {
    audit_log.log_automation_action(service_account, action).await;
}
```

**Security Features**:
- Service account isolation
- Mandatory audit logging
- Token-based authentication
- Short-lived credentials

#### Enterprise Mode
```rust
// Enterprise: Full security
// - Mandatory audit logging
// - Stricter session timeouts
// - Compliance reports
if mode == ExecutionMode::Enterprise {
    audit_log.log_with_compliance(user, action, compliance_tags).await;
}
```

**Security Features**:
- Full RBAC enforcement
- Comprehensive audit logging
- Compliance reporting
- Role assignment approval workflow

### 2.2 Permission System

**Permission Levels**:
```rust
Role::Admin         => 100  // Full access
Role::Operator      =>  80  // Deploy & manage
Role::Developer     =>  60  // Read + dev deploy
Role::ServiceAccount =>  50  // Automation
Role::Auditor       =>  40  // Read + audit
Role::Viewer        =>  20  // Read-only
```

**Action Security Levels**:
```rust
Action::Delete      => 100  // Destructive, admin only
Action::Manage      =>  80  // Service management
Action::Deploy      =>  80  // Deploy to production
Action::Create      =>  60  // Create resources
Action::Update      =>  60  // Modify resources
Action::Execute     =>  50  // Execute operations
Action::Audit       =>  40  // View audit logs
Action::Read        =>  20  // View resources
```

**Permission Check**:
```rust
pub fn can_perform(&self, required_level: u8) -> bool {
    self.permission_level() >= required_level
}
```

**Security Guarantees**:
- ✅ Least privilege by default (Viewer role)
- ✅ Hierarchical permissions (higher roles include lower)
- ✅ Explicit deny for unknown resources
- ✅ No permission escalation without admin

### 2.3 Session Security

**Session Configuration**:
```toml
[security]
session_timeout_minutes = 60     # Solo/MultiUser
session_timeout_minutes = 30     # Enterprise
max_sessions_per_user = 5
failed_login_lockout_attempts = 5
failed_login_lockout_duration_minutes = 15
```

**Session Lifecycle**:
1. User authenticates → JWT token issued
2. Token includes: user_id, role, issued_at, expires_at
3. Middleware validates token on each request
4. Session tracked in Redis/RocksDB
5. Session invalidated on logout or timeout

**Security Features**:
- JWT with RSA-2048 signatures
- Refresh token rotation
- Session fixation prevention
- Concurrent session limits

**Threat Mitigation**:
- ✅ **Session Hijacking**: Short-lived tokens (1 hour)
- ✅ **Token Replay**: One-time refresh tokens
- ✅ **Brute Force**: Account lockout after 5 failures
- ✅ **Session Fixation**: New session ID on login

### 2.4 Middleware Security

**RBAC Middleware Flow**:
```
Request → Auth Middleware → RBAC Middleware → Handler
            ↓                    ↓
        Extract User      Check Permission
        from JWT Token    (role + resource + action)
                               ↓
                         Allow / Deny
```

**Middleware Implementation**:
```rust
pub async fn check_permission(
    State(state): State<Arc<RbacMiddleware>>,
    resource: Resource,
    action: Action,
    mut req: Request,
    next: Next,
) -> Result<Response, RbacError> {
    let user = req.extensions()
        .get::<User>()
        .ok_or(RbacError::UserNotFound("No user in request".to_string()))?;

    if !state.rbac_manager.check_permission(&user, resource, action).await {
        return Err(RbacError::PermissionDenied);
    }

    Ok(next.run(req).await)
}
```

**Security Guarantees**:
- ✅ All API endpoints protected by default
- ✅ Permission checked before handler execution
- ✅ User context available in handlers
- ✅ Failed checks logged for audit

## 3. Platform Monitoring Security

### 3.1 Service Access Security

**Internal URLs Only**:
```toml
[platform]
orchestrator_url = "http://localhost:8080"       # Not exposed externally
coredns_url = "http://localhost:9153"
gitea_url = "http://localhost:3000"
oci_registry_url = "http://localhost:5000"
```

**Network Security**:
- All services on localhost or internal network
- No external exposure of monitoring endpoints
- Firewall rules to prevent external access

**Threat Mitigation**:
- ✅ **External Scanning**: Services not reachable from internet
- ✅ **DDoS**: Internal-only access limits attack surface
- ✅ **Data Exfiltration**: Monitoring data not exposed externally

### 3.2 Health Check Security

**Timeout Protection**:
```rust
let client = Client::builder()
    .timeout(std::time::Duration::from_secs(5))  // Prevent hanging
    .build()
    .unwrap();
```

**Error Handling**:
```rust
// Never expose internal errors to users
Err(e) => {
    // Log detailed error internally
    tracing::error!("Health check failed for {}: {}", service, e);

    // Return generic error externally
    ServiceStatus {
        status: HealthStatus::Unhealthy,
        error_message: Some("Service unavailable".to_string()),  // Generic
        ..
    }
}
```

**Threat Mitigation**:
- ✅ **Timeout Attacks**: 5-second timeout prevents resource exhaustion
- ✅ **Information Disclosure**: Error messages sanitized
- ✅ **Resource Exhaustion**: Parallel checks with concurrency limits

### 3.3 Service Control Security

**RBAC-Protected Service Control**:
```rust
// Only Operator or Admin can start/stop services
#[axum::debug_handler]
pub async fn start_service(
    State(state): State<AppState>,
    Extension(user): Extension<User>,
    Path(service_type): Path<String>,
) -> Result<StatusCode, ApiError> {
    // Check permission
    if !rbac_manager.check_permission(
        &user,
        Resource::Service,
        Action::Manage,
    ).await {
        return Err(ApiError::PermissionDenied);
    }

    // Start service
    service_manager.start_service(&service_type).await?;

    // Audit log
    audit_log.log_service_action(user, service_type, "start").await;

    Ok(StatusCode::OK)
}
```

**Security Guarantees**:
- ✅ Only authorized users can control services
- ✅ All service actions logged
- ✅ Graceful degradation on service failure

## 4. Threat Model

### 4.1 High-Risk Threats

#### Threat: SSH Private Key Exposure
**Attack Vector**: Attacker gains access to KMS database

**Mitigations**:
- Private keys encrypted at rest with master key
- Master key stored in hardware security module (HSM) or KMS
- Key access audited and rate-limited
- Zeroization of decrypted keys in memory

**Detection**:
- Audit log monitoring for unusual key access patterns
- Alerting on bulk key retrievals

#### Threat: Privilege Escalation
**Attack Vector**: Lower-privileged user attempts to gain admin access

**Mitigations**:
- Role assignment requires Admin role
- Mode switching requires Admin role
- Middleware enforces permissions on every request
- No client-side permission checks (server-side only)

**Detection**:
- Failed permission checks logged
- Alerting on repeated permission denials

#### Threat: Session Hijacking
**Attack Vector**: Attacker steals JWT token

**Mitigations**:
- Short-lived access tokens (1 hour)
- Refresh token rotation
- Secure HTTP-only cookies (recommended)
- IP address binding (optional)

**Detection**:
- Unusual login locations
- Concurrent sessions from different IPs

### 4.2 Medium-Risk Threats

#### Threat: Service Impersonation
**Attack Vector**: Malicious service pretends to be legitimate platform service

**Mitigations**:
- Service URLs configured in config file (not dynamic)
- TLS certificate validation (if HTTPS)
- Service authentication tokens

**Detection**:
- Health check failures
- Metrics anomalies

#### Threat: Audit Log Tampering
**Attack Vector**: Attacker modifies audit logs to hide tracks

**Mitigations**:
- Audit logs write-only
- Logs stored in tamper-evident database (SurrealDB)
- Hash chain for log integrity
- Offsite log backup

**Detection**:
- Hash chain verification
- Log gap detection

### 4.3 Low-Risk Threats

#### Threat: Information Disclosure via Error Messages
**Attack Vector**: Error messages leak internal information

**Mitigations**:
- Generic error messages for users
- Detailed errors only in server logs
- Error message sanitization

**Detection**:
- Code review for error handling
- Automated scanning for sensitive data in responses

## 5. Compliance Considerations

### 5.1 GDPR Compliance

**Personal Data Handling**:
- User information: username, email, IP addresses
- Retention: Audit logs kept for required period
- Right to erasure: User deletion deletes all associated data

**Implementation**:
```rust
// Delete user and all associated data
pub async fn delete_user(&self, user_id: &str) -> Result<(), RbacError> {
    // Delete user SSH keys
    for key in self.list_user_ssh_keys(user_id).await? {
        self.delete_ssh_key(&key.key_id).await?;
    }

    // Anonymize audit logs (retain for compliance, remove PII)
    self.anonymize_user_audit_logs(user_id).await?;

    // Delete user record
    self.delete_user_record(user_id).await?;

    Ok(())
}
```

### 5.2 SOC 2 Compliance

**Security Controls**:
- ✅ Access control (RBAC)
- ✅ Audit logging (all actions logged)
- ✅ Encryption at rest (KMS)
- ✅ Encryption in transit (HTTPS recommended)
- ✅ Session management (timeout, MFA support)

**Monitoring & Alerting**:
- ✅ Service health monitoring
- ✅ Failed login tracking
- ✅ Permission denial alerting
- ✅ Unusual activity detection

### 5.3 PCI DSS (if applicable)

**Requirements**:
- ✅ Encrypt cardholder data (use KMS for keys)
- ✅ Maintain access control (RBAC)
- ✅ Track and monitor access (audit logs)
- ✅ Regularly test security (integration tests)

## 6. Security Best Practices

### 6.1 Development

**Code Review Checklist**:
- [ ] All API endpoints have RBAC middleware
- [ ] No hardcoded secrets or keys
- [ ] Error messages don't leak sensitive info
- [ ] Audit logging for sensitive operations
- [ ] Input validation on all user inputs
- [ ] SQL injection prevention (use parameterized queries)
- [ ] XSS prevention (escape user inputs)

**Testing**:
- Unit tests for permission checks
- Integration tests for RBAC enforcement
- Penetration testing for production deployments

### 6.2 Deployment

**Production Checklist**:
- [ ] Change default admin password
- [ ] Enable HTTPS with valid certificate
- [ ] Configure firewall rules (internal services only)
- [ ] Set appropriate execution mode (Enterprise for production)
- [ ] Enable audit logging
- [ ] Configure session timeout (30 minutes for Enterprise)
- [ ] Enable rate limiting
- [ ] Set up log monitoring and alerting
- [ ] Regular security updates
- [ ] Backup encryption keys

### 6.3 Operations

**Incident Response**:
1. **Detection**: Monitor audit logs for anomalies
2. **Containment**: Revoke compromised credentials
3. **Eradication**: Rotate affected SSH keys
4. **Recovery**: Restore from backup if needed
5. **Lessons Learned**: Update security controls

**Key Rotation Schedule**:
- SSH keys: Every 90 days (Enterprise: 30 days)
- JWT signing keys: Every 180 days
- Master encryption key: Every 365 days
- Service account tokens: Every 30 days

## 7. Security Metrics

### 7.1 Monitoring Metrics

**Authentication**:
- Failed login attempts per user
- Concurrent sessions per user
- Session duration (average, p95, p99)

**Authorization**:
- Permission denials per user
- Permission denials per resource
- Role assignments per day

**Audit**:
- SSH key accesses per day
- SSH key rotations per month
- Audit log retention compliance

**Services**:
- Service health check success rate
- Service response times (p50, p95, p99)
- Service dependency failures

### 7.2 Alerting Thresholds

**Critical Alerts**:
- Service health: >3 failures in 5 minutes
- Failed logins: >10 attempts in 1 minute
- Permission denials: >50 in 1 minute
- SSH key bulk retrieval: >10 keys in 1 minute

**Warning Alerts**:
- Service degraded: response time >1 second
- Session timeout rate: >10% of sessions
- Audit log storage: >80% capacity

## 8. Security Roadmap

### Phase 1 (Completed)
- ✅ SSH key storage with encryption
- ✅ Mode-based RBAC
- ✅ Audit logging
- ✅ Platform monitoring

### Phase 2 (In Progress)
- 📋 API handlers with RBAC enforcement
- 📋 Integration tests for security
- 📋 Documentation

### Phase 3 (Future)
- Multi-factor authentication (MFA)
- Hardware security module (HSM) integration
- Advanced threat detection (ML-based)
- Automated security scanning
- Compliance report generation
- Security information and event management (SIEM) integration

## References

- **OWASP Top 10**: https://owasp.org/www-project-top-ten/
- **NIST Cybersecurity Framework**: https://www.nist.gov/cyberframework
- **CIS Controls**: https://www.cisecurity.org/controls
- **GDPR**: https://gdpr.eu/
- **SOC 2**: https://www.aicpa.org/interestareas/frc/assuranceadvisoryservices/socforserviceorganizations.html

---

**Last Updated**: 2025-10-06
**Review Cycle**: Quarterly
**Next Review**: 2026-01-06
core: init repo and codebase 2025-10-07 10:59:52 +01:00			`# Security Considerations for Control Center Enhancements`

			`## Overview`

			`This document outlines the security architecture and considerations for the control-center enhancements, including KMS SSH key management, mode-based RBAC, and platform service monitoring.`

			`## 1. SSH Key Management Security`

			`### 1.1 Key Storage Security`

			`Implementation:`
			`- Private keys encrypted at rest using AES-256-GCM in KMS`
			`- Public keys stored in plaintext (as they are meant to be public)`
			`- Private key material never exposed in API responses`
			`- Key IDs used as references, not actual keys`

			`Threat Mitigation:`
			`- ✅ Data at Rest: All private keys encrypted with master encryption key`
			`- ✅ Key Exposure: Private keys only decrypted in memory when needed`
			`- ✅ Key Leakage: Zeroization of key material after use`
			`- ✅ Unauthorized Access: KMS access controlled by RBAC`

			`Best Practices:`
			```rust
			`// Good: Using key ID reference`
			`let key_id = ssh_key_manager.store_ssh_key(name, private, public, purpose, tags).await?;`

			`// Bad: Never do this - exposing private key in logs`
			`tracing::info!("Stored key: {}", private_key); // DON'T DO THIS`
			```

			`### 1.2 Key Rotation Security`

			`Implementation:`
			`- Configurable rotation intervals (default 90 days)`
			`- Grace period for old key usage (default 7 days)`
			`- Automatic rotation scheduling (if enabled)`
			`- Manual rotation support with immediate effect`

			`Threat Mitigation:`
			`- ✅ Key Compromise: Regular rotation limits exposure window`
			`- ✅ Stale Keys: Automated detection of keys due for rotation`
			`- ✅ Rotation Failures: Graceful degradation with error logging`

			`Rotation Policy:`
			```toml
			`[kms.ssh_keys]`
			`rotation_enabled = true`
			`rotation_interval_days = 90 # Enterprise: 30, Dev: 180`
			`grace_period_days = 7 # Time to update deployed keys`
			`auto_rotate = false # Manual approval recommended`
			```

			`### 1.3 Audit Logging`

			`Logged Events:`
			`- SSH key creation (who, when, purpose)`
			`- SSH key retrieval (who accessed, when)`
			`- SSH key rotation (old key ID, new key ID)`
			`- SSH key deletion (who deleted, when)`
			`- Failed access attempts`

			`Audit Entry Structure:`
			```rust
			`pub struct SshKeyAuditEntry {`
			`pub timestamp: DateTime<Utc>,`
			`pub key_id: String,`
			`pub action: SshKeyAction,`
			`pub user: Option<String>, // User who performed action`
			`pub ip_address: Option<String>, // Source IP`
			`pub success: bool,`
			`pub error_message: Option<String>,`
			`}`
			```

			`Threat Mitigation:`
			`- ✅ Unauthorized Access: Full audit trail for forensics`
			`- ✅ Insider Threats: User attribution for all actions`
			`- ✅ Compliance: GDPR/SOC2 audit log requirements met`

			`Audit Log Retention:`
			`- In-memory: Last 10,000 entries`
			`- Persistent: SurrealDB with 1-year retention`
			`- Compliance mode: 7-year retention (configurable)`

			`### 1.4 Key Fingerprinting`

			`Implementation:`
			```rust
			`fn calculate_fingerprint(public_key: &[u8]) -> Result<String, KmsError> {`
			`use sha2::{Sha256, Digest};`
			`let mut hasher = Sha256::new();`
			`hasher.update(public_key);`
			`let result = hasher.finalize();`
			`Ok(format!("SHA256:{}", base64::encode(&result[..16])))`
			`}`
			```

			`Security Benefits:`
			`- Verify key integrity`
			`- Detect key tampering`
			`- Match deployed keys to KMS records`

			`## 2. RBAC Security`

			`### 2.1 Execution Modes`

			`Security Model by Mode:`

			`\| Mode \| Security Level \| Use Case \| Audit Required \|`
			`\|------\|---------------\|----------\|----------------\|`
			`\| Solo \| Low \| Single developer \| No \|`
			`\| MultiUser \| Medium \| Small teams \| Optional \|`
			`\| CICD \| Medium \| Automation \| Yes \|`
			`\| Enterprise \| High \| Production \| Mandatory \|`

			`Mode-Specific Security:`

			`#### Solo Mode`
			```rust
			`// Solo mode: All users are admin`
			`// Security: Trust-based, no RBAC checks`
			`if mode == ExecutionMode::Solo {`
			`return true; // Allow all operations`
			`}`
			```

			`Risks:`
			`- No access control`
			`- No audit trail`
			`- Single point of failure`

			`Mitigations:`
			`- Only for development environments`
			`- Network isolation required`
			`- Regular backups`

			`#### MultiUser Mode`
			```rust
			`// Multi-user: Role-based access control`
			`let permissions = rbac_manager.get_user_permissions(&user).await;`
			`if !permissions.contains(&required_permission) {`
			`return Err(RbacError::PermissionDenied);`
			`}`
			```

			`Security Features:`
			`- Role-based permissions`
			`- Optional audit logging`
			`- Session management`

			`#### CICD Mode`
			```rust
			`// CICD: Service account focused`
			`// All actions logged for automation tracking`
			`if mode == ExecutionMode::CICD {`
			`audit_log.log_automation_action(service_account, action).await;`
			`}`
			```

			`Security Features:`
			`- Service account isolation`
			`- Mandatory audit logging`
			`- Token-based authentication`
			`- Short-lived credentials`

			`#### Enterprise Mode`
			```rust
			`// Enterprise: Full security`
			`// - Mandatory audit logging`
			`// - Stricter session timeouts`
			`// - Compliance reports`
			`if mode == ExecutionMode::Enterprise {`
			`audit_log.log_with_compliance(user, action, compliance_tags).await;`
			`}`
			```

			`Security Features:`
			`- Full RBAC enforcement`
			`- Comprehensive audit logging`
			`- Compliance reporting`
			`- Role assignment approval workflow`

			`### 2.2 Permission System`

			`Permission Levels:`
			```rust
			`Role::Admin => 100 // Full access`
			`Role::Operator => 80 // Deploy & manage`
			`Role::Developer => 60 // Read + dev deploy`
			`Role::ServiceAccount => 50 // Automation`
			`Role::Auditor => 40 // Read + audit`
			`Role::Viewer => 20 // Read-only`
			```

			`Action Security Levels:`
			```rust
			`Action::Delete => 100 // Destructive, admin only`
			`Action::Manage => 80 // Service management`
			`Action::Deploy => 80 // Deploy to production`
			`Action::Create => 60 // Create resources`
			`Action::Update => 60 // Modify resources`
			`Action::Execute => 50 // Execute operations`
			`Action::Audit => 40 // View audit logs`
			`Action::Read => 20 // View resources`
			```

			`Permission Check:`
			```rust
			`pub fn can_perform(&self, required_level: u8) -> bool {`
			`self.permission_level() >= required_level`
			`}`
			```

			`Security Guarantees:`
			`- ✅ Least privilege by default (Viewer role)`
			`- ✅ Hierarchical permissions (higher roles include lower)`
			`- ✅ Explicit deny for unknown resources`
			`- ✅ No permission escalation without admin`

			`### 2.3 Session Security`

			`Session Configuration:`
			```toml
			`[security]`
			`session_timeout_minutes = 60 # Solo/MultiUser`
			`session_timeout_minutes = 30 # Enterprise`
			`max_sessions_per_user = 5`
			`failed_login_lockout_attempts = 5`
			`failed_login_lockout_duration_minutes = 15`
			```

			`Session Lifecycle:`
			`1. User authenticates → JWT token issued`
			`2. Token includes: user_id, role, issued_at, expires_at`
			`3. Middleware validates token on each request`
			`4. Session tracked in Redis/RocksDB`
			`5. Session invalidated on logout or timeout`

			`Security Features:`
			`- JWT with RSA-2048 signatures`
			`- Refresh token rotation`
			`- Session fixation prevention`
			`- Concurrent session limits`

			`Threat Mitigation:`
			`- ✅ Session Hijacking: Short-lived tokens (1 hour)`
			`- ✅ Token Replay: One-time refresh tokens`
			`- ✅ Brute Force: Account lockout after 5 failures`
			`- ✅ Session Fixation: New session ID on login`

			`### 2.4 Middleware Security`

			`RBAC Middleware Flow:`
			```
			`Request → Auth Middleware → RBAC Middleware → Handler`
			`↓ ↓`
			`Extract User Check Permission`
			`from JWT Token (role + resource + action)`
			`↓`
			`Allow / Deny`
			```

			`Middleware Implementation:`
			```rust
			`pub async fn check_permission(`
			`State(state): State<Arc<RbacMiddleware>>,`
			`resource: Resource,`
			`action: Action,`
			`mut req: Request,`
			`next: Next,`
			`) -> Result<Response, RbacError> {`
			`let user = req.extensions()`
			`.get::<User>()`
			`.ok_or(RbacError::UserNotFound("No user in request".to_string()))?;`

			`if !state.rbac_manager.check_permission(&user, resource, action).await {`
			`return Err(RbacError::PermissionDenied);`
			`}`

			`Ok(next.run(req).await)`
			`}`
			```

			`Security Guarantees:`
			`- ✅ All API endpoints protected by default`
			`- ✅ Permission checked before handler execution`
			`- ✅ User context available in handlers`
			`- ✅ Failed checks logged for audit`

			`## 3. Platform Monitoring Security`

			`### 3.1 Service Access Security`

			`Internal URLs Only:`
			```toml
			`[platform]`
			`orchestrator_url = "http://localhost:8080" # Not exposed externally`
			`coredns_url = "http://localhost:9153"`
			`gitea_url = "http://localhost:3000"`
			`oci_registry_url = "http://localhost:5000"`
			```

			`Network Security:`
			`- All services on localhost or internal network`
			`- No external exposure of monitoring endpoints`
			`- Firewall rules to prevent external access`

			`Threat Mitigation:`
			`- ✅ External Scanning: Services not reachable from internet`
			`- ✅ DDoS: Internal-only access limits attack surface`
			`- ✅ Data Exfiltration: Monitoring data not exposed externally`

			`### 3.2 Health Check Security`

			`Timeout Protection:`
			```rust
			`let client = Client::builder()`
			`.timeout(std::time::Duration::from_secs(5)) // Prevent hanging`
			`.build()`
			`.unwrap();`
			```

			`Error Handling:`
			```rust
			`// Never expose internal errors to users`
			`Err(e) => {`
			`// Log detailed error internally`
			`tracing::error!("Health check failed for {}: {}", service, e);`

			`// Return generic error externally`
			`ServiceStatus {`
			`status: HealthStatus::Unhealthy,`
			`error_message: Some("Service unavailable".to_string()), // Generic`
			`..`
			`}`
			`}`
			```

			`Threat Mitigation:`
			`- ✅ Timeout Attacks: 5-second timeout prevents resource exhaustion`
			`- ✅ Information Disclosure: Error messages sanitized`
			`- ✅ Resource Exhaustion: Parallel checks with concurrency limits`

			`### 3.3 Service Control Security`

			`RBAC-Protected Service Control:`
			```rust
			`// Only Operator or Admin can start/stop services`
			`#[axum::debug_handler]`
			`pub async fn start_service(`
			`State(state): State<AppState>,`
			`Extension(user): Extension<User>,`
			`Path(service_type): Path<String>,`
			`) -> Result<StatusCode, ApiError> {`
			`// Check permission`
			`if !rbac_manager.check_permission(`
			`&user,`
			`Resource::Service,`
			`Action::Manage,`
			`).await {`
			`return Err(ApiError::PermissionDenied);`
			`}`

			`// Start service`
			`service_manager.start_service(&service_type).await?;`

			`// Audit log`
			`audit_log.log_service_action(user, service_type, "start").await;`

			`Ok(StatusCode::OK)`
			`}`
			```

			`Security Guarantees:`
			`- ✅ Only authorized users can control services`
			`- ✅ All service actions logged`
			`- ✅ Graceful degradation on service failure`

			`## 4. Threat Model`

			`### 4.1 High-Risk Threats`

			`#### Threat: SSH Private Key Exposure`
			`Attack Vector: Attacker gains access to KMS database`

			`Mitigations:`
			`- Private keys encrypted at rest with master key`
			`- Master key stored in hardware security module (HSM) or KMS`
			`- Key access audited and rate-limited`
			`- Zeroization of decrypted keys in memory`

			`Detection:`
			`- Audit log monitoring for unusual key access patterns`
			`- Alerting on bulk key retrievals`

			`#### Threat: Privilege Escalation`
			`Attack Vector: Lower-privileged user attempts to gain admin access`

			`Mitigations:`
			`- Role assignment requires Admin role`
			`- Mode switching requires Admin role`
			`- Middleware enforces permissions on every request`
			`- No client-side permission checks (server-side only)`

			`Detection:`
			`- Failed permission checks logged`
			`- Alerting on repeated permission denials`

			`#### Threat: Session Hijacking`
			`Attack Vector: Attacker steals JWT token`

			`Mitigations:`
			`- Short-lived access tokens (1 hour)`
			`- Refresh token rotation`
			`- Secure HTTP-only cookies (recommended)`
			`- IP address binding (optional)`

			`Detection:`
			`- Unusual login locations`
			`- Concurrent sessions from different IPs`

			`### 4.2 Medium-Risk Threats`

			`#### Threat: Service Impersonation`
			`Attack Vector: Malicious service pretends to be legitimate platform service`

			`Mitigations:`
			`- Service URLs configured in config file (not dynamic)`
			`- TLS certificate validation (if HTTPS)`
			`- Service authentication tokens`

			`Detection:`
			`- Health check failures`
			`- Metrics anomalies`

			`#### Threat: Audit Log Tampering`
			`Attack Vector: Attacker modifies audit logs to hide tracks`

			`Mitigations:`
			`- Audit logs write-only`
			`- Logs stored in tamper-evident database (SurrealDB)`
			`- Hash chain for log integrity`
			`- Offsite log backup`

			`Detection:`
			`- Hash chain verification`
			`- Log gap detection`

			`### 4.3 Low-Risk Threats`

			`#### Threat: Information Disclosure via Error Messages`
			`Attack Vector: Error messages leak internal information`

			`Mitigations:`
			`- Generic error messages for users`
			`- Detailed errors only in server logs`
			`- Error message sanitization`

			`Detection:`
			`- Code review for error handling`
			`- Automated scanning for sensitive data in responses`

			`## 5. Compliance Considerations`

			`### 5.1 GDPR Compliance`

			`Personal Data Handling:`
			`- User information: username, email, IP addresses`
			`- Retention: Audit logs kept for required period`
			`- Right to erasure: User deletion deletes all associated data`

			`Implementation:`
			```rust
			`// Delete user and all associated data`
			`pub async fn delete_user(&self, user_id: &str) -> Result<(), RbacError> {`
			`// Delete user SSH keys`
			`for key in self.list_user_ssh_keys(user_id).await? {`
			`self.delete_ssh_key(&key.key_id).await?;`
			`}`

			`// Anonymize audit logs (retain for compliance, remove PII)`
			`self.anonymize_user_audit_logs(user_id).await?;`

			`// Delete user record`
			`self.delete_user_record(user_id).await?;`

			`Ok(())`
			`}`
			```

			`### 5.2 SOC 2 Compliance`

			`Security Controls:`
			`- ✅ Access control (RBAC)`
			`- ✅ Audit logging (all actions logged)`
			`- ✅ Encryption at rest (KMS)`
			`- ✅ Encryption in transit (HTTPS recommended)`
			`- ✅ Session management (timeout, MFA support)`

			`Monitoring & Alerting:`
			`- ✅ Service health monitoring`
			`- ✅ Failed login tracking`
			`- ✅ Permission denial alerting`
			`- ✅ Unusual activity detection`

			`### 5.3 PCI DSS (if applicable)`

			`Requirements:`
			`- ✅ Encrypt cardholder data (use KMS for keys)`
			`- ✅ Maintain access control (RBAC)`
			`- ✅ Track and monitor access (audit logs)`
			`- ✅ Regularly test security (integration tests)`

			`## 6. Security Best Practices`

			`### 6.1 Development`

			`Code Review Checklist:`
			`- [ ] All API endpoints have RBAC middleware`
			`- [ ] No hardcoded secrets or keys`
			`- [ ] Error messages don't leak sensitive info`
			`- [ ] Audit logging for sensitive operations`
			`- [ ] Input validation on all user inputs`
			`- [ ] SQL injection prevention (use parameterized queries)`
			`- [ ] XSS prevention (escape user inputs)`

			`Testing:`
			`- Unit tests for permission checks`
			`- Integration tests for RBAC enforcement`
			`- Penetration testing for production deployments`

			`### 6.2 Deployment`

			`Production Checklist:`
			`- [ ] Change default admin password`
			`- [ ] Enable HTTPS with valid certificate`
			`- [ ] Configure firewall rules (internal services only)`
			`- [ ] Set appropriate execution mode (Enterprise for production)`
			`- [ ] Enable audit logging`
			`- [ ] Configure session timeout (30 minutes for Enterprise)`
			`- [ ] Enable rate limiting`
			`- [ ] Set up log monitoring and alerting`
			`- [ ] Regular security updates`
			`- [ ] Backup encryption keys`

			`### 6.3 Operations`

			`Incident Response:`
			`1. Detection: Monitor audit logs for anomalies`
			`2. Containment: Revoke compromised credentials`
			`3. Eradication: Rotate affected SSH keys`
			`4. Recovery: Restore from backup if needed`
			`5. Lessons Learned: Update security controls`

			`Key Rotation Schedule:`
			`- SSH keys: Every 90 days (Enterprise: 30 days)`
			`- JWT signing keys: Every 180 days`
			`- Master encryption key: Every 365 days`
			`- Service account tokens: Every 30 days`

			`## 7. Security Metrics`

			`### 7.1 Monitoring Metrics`

			`Authentication:`
			`- Failed login attempts per user`
			`- Concurrent sessions per user`
			`- Session duration (average, p95, p99)`

			`Authorization:`
			`- Permission denials per user`
			`- Permission denials per resource`
			`- Role assignments per day`

			`Audit:`
			`- SSH key accesses per day`
			`- SSH key rotations per month`
			`- Audit log retention compliance`

			`Services:`
			`- Service health check success rate`
			`- Service response times (p50, p95, p99)`
			`- Service dependency failures`

			`### 7.2 Alerting Thresholds`

			`Critical Alerts:`
			`- Service health: >3 failures in 5 minutes`
			`- Failed logins: >10 attempts in 1 minute`
			`- Permission denials: >50 in 1 minute`
			`- SSH key bulk retrieval: >10 keys in 1 minute`

			`Warning Alerts:`
			`- Service degraded: response time >1 second`
			`- Session timeout rate: >10% of sessions`
			`- Audit log storage: >80% capacity`

			`## 8. Security Roadmap`

			`### Phase 1 (Completed)`
			`- ✅ SSH key storage with encryption`
			`- ✅ Mode-based RBAC`
			`- ✅ Audit logging`
			`- ✅ Platform monitoring`

			`### Phase 2 (In Progress)`
			`- 📋 API handlers with RBAC enforcement`
			`- 📋 Integration tests for security`
			`- 📋 Documentation`

			`### Phase 3 (Future)`
			`- Multi-factor authentication (MFA)`
			`- Hardware security module (HSM) integration`
			`- Advanced threat detection (ML-based)`
			`- Automated security scanning`
			`- Compliance report generation`
			`- Security information and event management (SIEM) integration`

			`## References`

			`- OWASP Top 10: https://owasp.org/www-project-top-ten/`
			`- NIST Cybersecurity Framework: https://www.nist.gov/cyberframework`
			`- CIS Controls: https://www.cisecurity.org/controls`
			`- GDPR: https://gdpr.eu/`
			`- SOC 2: https://www.aicpa.org/interestareas/frc/assuranceadvisoryservices/socforserviceorganizations.html`

			`---`

			`Last Updated: 2025-10-06`
			`Review Cycle: Quarterly`
			`Next Review: 2026-01-06`