# VAPORA Backup Strategy Comprehensive backup and data protection strategy for VAPORA infrastructure. --- ## Overview **Purpose**: Protect against data loss, corruption, and service interruptions **Coverage**: - Database backups (SurrealDB) - Configuration backups (ConfigMaps, Secrets) - Application state - Infrastructure-as-Code - Container images **Success Metrics**: - RPO (Recovery Point Objective): 1 hour (lose at most 1 hour of data) - RTO (Recovery Time Objective): 4 hours (restore service within 4 hours) - Backup availability: 99.9% (backups always available when needed) - Backup validation: 100% (all backups tested monthly) --- ## Backup Architecture ### What Gets Backed Up ``` VAPORA Backup Scope Critical (Daily): ├── Database │ ├── SurrealDB data │ ├── User data │ ├── Project/task data │ └── Audit logs ├── Configuration │ ├── ConfigMaps │ ├── Secrets │ └── Deployment manifests └── Infrastructure Code ├── Provisioning/Nickel configs ├── Kubernetes manifests └── Scripts Important (Weekly): ├── Application logs ├── Metrics data └── Documentation updates Optional (As-needed): ├── Container images ├── Build artifacts └── Development configurations ``` ### Backup Storage Strategy ``` PRIMARY BACKUP LOCATION ├── Storage: Cloud object storage (S3/GCS/Azure Blob) ├── Frequency: Hourly for database, daily for configs ├── Retention: 30 days rolling window ├── Encryption: AES-256 at rest └── Redundancy: Geo-replicated to different region SECONDARY BACKUP LOCATION (for critical data) ├── Storage: Different cloud provider or on-prem ├── Frequency: Daily ├── Retention: 90 days ├── Purpose: Protection against primary provider outage └── Testing: Restore tested weekly ARCHIVE LOCATION (compliance/long-term) ├── Storage: Cold storage (Glacier, Azure Archive) ├── Frequency: Monthly ├── Retention: 7 years (adjust per compliance needs) ├── Purpose: Compliance & legal holds └── Accessibility: ~4 hours to retrieve ``` --- ## Database Backup Procedures ### SurrealDB Backup **Backup Method**: Full database dump via SurrealDB export ```bash # Export full database kubectl exec -n vapora surrealdb-pod -- \ surreal export --conn ws://localhost:8000 \ --user root \ --pass "$DB_PASSWORD" \ --output backup-$(date +%Y%m%d-%H%M%S).sql # Expected size: 100MB-1GB (depending on data) # Expected time: 5-15 minutes ``` **Automated Backup Setup** ```bash # Create backup script: provisioning/scripts/backup-database.nu def backup_database [output_dir: string] { let timestamp = (date now | format date %Y%m%d-%H%M%S) let backup_file = $"($output_dir)/vapora-db-($timestamp).sql" print $"Starting database backup to ($backup_file)..." # Export database kubectl exec -n vapora deployment/vapora-backend -- \ surreal export \ --conn ws://localhost:8000 \ --user root \ --pass $env.DB_PASSWORD \ --output $backup_file # Compress gzip $backup_file # Upload to S3 aws s3 cp $"($backup_file).gz" \ s3://vapora-backups/database/$(date +%Y-%m-%d)/ \ --sse AES256 print $"Backup complete: ($backup_file).gz" } ``` **Backup Schedule** ```yaml # Kubernetes CronJob for hourly backups apiVersion: batch/v1 kind: CronJob metadata: name: database-backup namespace: vapora spec: schedule: "0 * * * *" # Every hour jobTemplate: spec: template: spec: containers: - name: backup image: vapora/backup-tools:latest command: - /scripts/backup-database.sh env: - name: DB_PASSWORD valueFrom: secretKeyRef: name: db-credentials key: password - name: AWS_ACCESS_KEY_ID valueFrom: secretKeyRef: name: aws-credentials key: access-key restartPolicy: OnFailure ``` ### Backup Retention Policy ``` Hourly backups (last 24 hours): ├── Keep: All hourly backups ├── Purpose: Granular recovery options └── Storage: Standard (fast access) Daily backups (last 30 days): ├── Keep: 1 per day at midnight UTC ├── Purpose: Daily recovery options └── Storage: Standard (fast access) Weekly backups (last 90 days): ├── Keep: 1 per Sunday at midnight UTC ├── Purpose: Medium-term recovery └── Storage: Standard Monthly backups (7 years): ├── Keep: 1 per month on 1st at midnight UTC ├── Purpose: Compliance & long-term recovery └── Storage: Archive (cold storage) ``` ### Backup Verification ```bash # Daily backup verification def verify_backup [backup_file: string] { print $"Verifying backup: ($backup_file)" # 1. Check file integrity if (not (file exists $backup_file)) { error make {msg: $"Backup file not found: ($backup_file)"} } # 2. Check file size (should be > 1MB) let size = (ls $backup_file | get 0.size) if ($size < 1000000) { error make {msg: $"Backup file too small: ($size) bytes"} } # 3. Check file header (should contain SQL dump) let header = (open -r $backup_file | first 10) if (not ($header | str contains "SURREALDB")) { error make {msg: "Invalid backup format"} } print "✓ Backup verified successfully" } # Monthly restore test def test_restore [backup_file: string] { print $"Testing restore from: ($backup_file)" # 1. Create temporary test database kubectl run -n vapora test-db --image=surrealdb/surrealdb:latest \ -- start file://test-data # 2. Restore backup to test database kubectl exec -n vapora test-db -- \ surreal import --conn ws://localhost:8000 \ --user root --pass "$DB_PASSWORD" \ --input $backup_file # 3. Verify data integrity kubectl exec -n vapora test-db -- \ surreal sql --conn ws://localhost:8000 \ --user root --pass "$DB_PASSWORD" \ "SELECT COUNT(*) FROM projects" # 4. Compare record counts # Should match production database # 5. Cleanup test database kubectl delete pod -n vapora test-db print "✓ Restore test passed" } ``` --- ## Configuration Backup ### ConfigMap & Secret Backups ```bash # Backup all ConfigMaps kubectl get configmap -n vapora -o yaml > configmaps-backup-$(date +%Y%m%d).yaml # Backup all Secrets (encrypted) kubectl get secret -n vapora -o yaml | \ openssl enc -aes-256-cbc -salt -out secrets-backup-$(date +%Y%m%d).yaml.enc # Upload to S3 aws s3 sync . s3://vapora-backups/k8s-configs/$(date +%Y-%m-%d)/ \ --exclude "*" --include "*.yaml" --include "*.yaml.enc" \ --sse AES256 ``` **Automated Nushell Script** ```nushell def backup_k8s_configs [output_dir: string] { let timestamp = (date now | format date %Y%m%d) let config_dir = $"($output_dir)/k8s-configs-($timestamp)" mkdir $config_dir # Backup ConfigMaps kubectl get configmap -n vapora -o yaml > $"($config_dir)/configmaps.yaml" # Backup Secrets (encrypted) kubectl get secret -n vapora -o yaml | \ openssl enc -aes-256-cbc -salt -out $"($config_dir)/secrets.yaml.enc" # Backup Deployments kubectl get deployments -n vapora -o yaml > $"($config_dir)/deployments.yaml" # Backup Services kubectl get services -n vapora -o yaml > $"($config_dir)/services.yaml" # Backup all to archive tar -czf $"($config_dir).tar.gz" $config_dir # Upload aws s3 cp $"($config_dir).tar.gz" \ s3://vapora-backups/configs/ \ --sse AES256 print "✓ K8s configs backed up" } ``` --- ## Infrastructure-as-Code Backups ### Git Repository Backups **Primary**: GitHub (with backup organization) ```bash # Mirror repository to backup location git clone --mirror https://github.com/your-org/vapora.git \ vapora-mirror.git # Push to backup location cd vapora-mirror.git git push --mirror https://backup-git-server/vapora-mirror.git ``` **Backup Schedule** ```yaml # Daily mirror push */6 * * * * /scripts/backup-git-repo.sh ``` ### Provisioning Code Backups ```bash # Backup Nickel configs & scripts def backup_provisioning_code [output_dir: string] { let timestamp = (date now | format date %Y%m%d) # Create backup tar -czf $"($output_dir)/provisioning-($timestamp).tar.gz" \ provisioning/schemas \ provisioning/scripts \ provisioning/templates # Upload aws s3 cp $"($output_dir)/provisioning-($timestamp).tar.gz" \ s3://vapora-backups/provisioning/ \ --sse AES256 } ``` --- ## Application State Backups ### Persistent Volume Backups If using persistent volumes for data: ```bash # Backup PersistentVolumeClaims def backup_pvcs [namespace: string] { let pvcs = (kubectl get pvc -n $namespace -o json | from json).items for pvc in $pvcs { let pvc_name = $pvc.metadata.name let volume_size = $pvc.spec.resources.requests.storage print $"Backing up PVC: ($pvc_name) (($volume_size))" # Create snapshot (cloud-specific) aws ec2 create-snapshot \ --volume-id $pvc_name \ --description $"VAPORA backup $(date +%Y-%m-%d)" } } ``` ### Application Logs ```bash # Export logs for archive def backup_application_logs [output_dir: string] { let timestamp = (date now | format date %Y%m%d) # Export last 7 days of logs kubectl logs deployment/vapora-backend -n vapora \ --since=168h > $"($output_dir)/backend-logs-($timestamp).log" kubectl logs deployment/vapora-agents -n vapora \ --since=168h > $"($output_dir)/agents-logs-($timestamp).log" # Compress and upload gzip $"($output_dir)/*.log" aws s3 sync $output_dir s3://vapora-backups/logs/ \ --exclude "*" --include "*.log.gz" \ --sse AES256 } ``` --- ## Container Image Backups ### Docker Image Registry ```bash # Tag images for backup docker tag vapora/backend:latest vapora/backend:backup-$(date +%Y%m%d) docker tag vapora/agents:latest vapora/agents:backup-$(date +%Y%m%d) docker tag vapora/llm-router:latest vapora/llm-router:backup-$(date +%Y%m%d) # Push to backup registry docker push backup-registry/vapora/backend:backup-$(date +%Y%m%d) docker push backup-registry/vapora/agents:backup-$(date +%Y%m%d) docker push backup-registry/vapora/llm-router:backup-$(date +%Y%m%d) # Retention: Keep last 30 days of images ``` --- ## Backup Monitoring ### Backup Health Checks ```bash # Daily backup status check def check_backup_status [] { print "=== Backup Status Report ===" # 1. Check latest database backup let latest_db = (aws s3 ls s3://vapora-backups/database/ \ --recursive | tail -1) let db_age = (date now) - ($latest_db | from json | get LastModified) if ($db_age > 2h) { print "⚠️ Database backup stale (> 2 hours old)" } else { print "✓ Database backup current" } # 2. Check config backup let config_count = (aws s3 ls s3://vapora-backups/configs/ | wc -l) if ($config_count > 0) { print "✓ Config backups present" } else { print "❌ No config backups found" } # 3. Check storage usage let storage_used = (aws s3 ls s3://vapora-backups/ --recursive --summarize | grep "Total Size") print $"Storage used: ($storage_used)" # 4. Check backup encryption let objects = (aws s3api list-objects-v2 --bucket vapora-backups --query 'Contents[*]') # All should have ServerSideEncryption: AES256 print "=== End Report ===" } ``` ### Backup Alerts Configure alerts for: ```yaml Backup Failures: - Threshold: Backup not completed in 2 hours - Action: Alert operations team - Severity: High Backup Staleness: - Threshold: Latest backup > 24 hours old - Action: Alert operations team - Severity: High Storage Capacity: - Threshold: Backup storage > 80% full - Action: Alert & plan cleanup - Severity: Medium Restore Test Failures: - Threshold: Monthly restore test fails - Action: Alert & investigate - Severity: Critical ``` --- ## Backup Testing & Validation ### Monthly Restore Test **Schedule**: First Sunday of each month at 02:00 UTC ```bash def monthly_restore_test [] { print "Starting monthly restore test..." # 1. Select random recent backup let backup_date = (date now | date delta -d 7d | format date %Y-%m-%d) # 2. Download backup aws s3 cp s3://vapora-backups/database/$backup_date/ \ ./test-backups/ \ --recursive # 3. Restore to test environment # (See Database Recovery Procedures) # 4. Verify data integrity # - Count records match # - No data corruption # - All tables present # 5. Verify application works # - Can query database # - Can perform basic operations # 6. Document results # - Success/failure # - Any issues found # - Time taken print "✓ Restore test completed" } ``` ### Backup Audit Report **Quarterly**: Generate backup audit report ```bash def quarterly_backup_audit [] { print "=== Quarterly Backup Audit Report ===" print $"Report Date: (date now | format date %Y-%m-%d)" print "" print "1. Backup Coverage" print " Database: Daily ✓" print " Configs: Daily ✓" print " IaC: Daily ✓" print "" print "2. Restore Tests (Last Quarter)" print " Tests Performed: 3" print " Tests Passed: 3" print " Average Restore Time: 2.5 hours" print "" print "3. Storage Usage" # Calculate storage per category print "4. Backup Age Distribution" # Show age distribution of backups print "5. Incidents & Issues" # Any backup-related incidents print "6. Recommendations" # Any needed improvements } ``` --- ## Backup Security ### Encryption - ✅ All backups encrypted at rest (AES-256) - ✅ All backups encrypted in transit (HTTPS/TLS) - ✅ Encryption keys managed by cloud provider or KMS - ✅ Separate keys for database and config backups ### Access Control ``` Backup Access Policy: Read Access: - Operations team - Disaster recovery team - Compliance/audit team Write Access: - Automated backup system only - Require 2FA for manual backups Delete/Modify Access: - Require 2 approvals - Audit logging enabled - 24-hour delay before deletion ``` ### Audit Logging ```bash # All backup operations logged - Backup creation: When, size, hash - Backup retrieval: Who, when, what - Restore operations: When, who, from where - Backup deletion: When, who, reason # Logs stored separately and immutable # Example: CloudTrail, S3 access logs, custom logging ``` --- ## Backup Disaster Scenarios ### Scenario 1: Single Database Backup Fails **Impact**: 1-hour data loss risk **Prevention**: - Backup redundancy (multiple copies) - Multiple backup methods - Backup validation after each backup **Recovery**: - Use previous hour's backup - Restore to test environment first - Validate data integrity - Restore to production if good ### Scenario 2: Backup Storage Compromised **Impact**: Data loss + security breach **Prevention**: - Encryption with separate keys - Geographic redundancy - Backup verification signing - Access control restrictions **Recovery**: - Activate secondary backup location - Restore from archive backups - Full security audit ### Scenario 3: Ransomware Infection **Impact**: All recent backups encrypted **Prevention**: - Immutable backups (WORM) - Air-gapped backups (offline) - Archive-only old backups - Regular backup verification **Recovery**: - Use air-gapped backup - Restore to clean environment - Full security remediation ### Scenario 4: Accidental Data Deletion **Impact**: Data loss from point of deletion **Prevention**: - Frequent backups (hourly) - Soft deletes in application - Audit logging **Recovery**: - Restore from backup before deletion time - Point-in-time recovery if available --- ## Backup Checklists ### Daily - [ ] Database backup completed - [ ] Backup size normal (not 0 bytes) - [ ] No backup errors in logs - [ ] Upload to S3 succeeded - [ ] Previous backup still available ### Weekly - [ ] Database backup retention verified - [ ] Config backup completed - [ ] Infrastructure code backed up - [ ] Backup storage space adequate - [ ] Encryption keys accessible ### Monthly - [ ] Restore test scheduled - [ ] Backup audit report generated - [ ] Backup verification successful - [ ] Archive backups created - [ ] Old backups properly retained ### Quarterly - [ ] Full audit report completed - [ ] Backup strategy reviewed - [ ] Team trained on procedures - [ ] RTO/RPO targets met - [ ] Recommendations implemented --- ## Summary **Backup Strategy at a Glance**: | Item | Frequency | Retention | Storage | Encryption | |------|-----------|-----------|---------|-----------| | **Database** | Hourly | 30 days | S3 | AES-256 | | **Config** | Daily | 90 days | S3 | AES-256 | | **IaC** | Daily | 30 days | Git + S3 | AES-256 | | **Images** | Daily | 30 days | Registry | Built-in | | **Archive** | Monthly | 7 years | Glacier | AES-256 | **Key Metrics**: - RPO: 1 hour (lose at most 1 hour of data) - RTO: 4 hours (restore within 4 hours) - Availability: 99.9% (backups available when needed) - Validation: 100% (all backups tested monthly) **Success Criteria**: - ✅ Daily backup completion - ✅ Backup validation passes - ✅ Monthly restore test successful - ✅ No security incidents - ✅ Compliance requirements met