VAPORA Backup Strategy
Comprehensive backup and data protection strategy for VAPORA infrastructure.
Overview
Purpose: Protect against data loss, corruption, and service interruptions
Coverage:
- Database backups (SurrealDB)
- Configuration backups (ConfigMaps, Secrets)
- Application state
- Infrastructure-as-Code
- Container images
Success Metrics:
- RPO (Recovery Point Objective): 1 hour (lose at most 1 hour of data)
- RTO (Recovery Time Objective): 4 hours (restore service within 4 hours)
- Backup availability: 99.9% (backups always available when needed)
- Backup validation: 100% (all backups tested monthly)
Backup Architecture
What Gets Backed Up
VAPORA Backup Scope
Critical (Daily):
├── Database
│ ├── SurrealDB data
│ ├── User data
│ ├── Project/task data
│ └── Audit logs
├── Configuration
│ ├── ConfigMaps
│ ├── Secrets
│ └── Deployment manifests
└── Infrastructure Code
├── Provisioning/Nickel configs
├── Kubernetes manifests
└── Scripts
Important (Weekly):
├── Application logs
├── Metrics data
└── Documentation updates
Optional (As-needed):
├── Container images
├── Build artifacts
└── Development configurations
Backup Storage Strategy
PRIMARY BACKUP LOCATION
├── Storage: Cloud object storage (S3/GCS/Azure Blob)
├── Frequency: Hourly for database, daily for configs
├── Retention: 30 days rolling window
├── Encryption: AES-256 at rest
└── Redundancy: Geo-replicated to different region
SECONDARY BACKUP LOCATION (for critical data)
├── Storage: Different cloud provider or on-prem
├── Frequency: Daily
├── Retention: 90 days
├── Purpose: Protection against primary provider outage
└── Testing: Restore tested weekly
ARCHIVE LOCATION (compliance/long-term)
├── Storage: Cold storage (Glacier, Azure Archive)
├── Frequency: Monthly
├── Retention: 7 years (adjust per compliance needs)
├── Purpose: Compliance & legal holds
└── Accessibility: ~4 hours to retrieve
Database Backup Procedures
SurrealDB Backup
Backup Method: Full database dump via SurrealDB export
# Export full database
kubectl exec -n vapora surrealdb-pod -- \
surreal export --conn ws://localhost:8000 \
--user root \
--pass "$DB_PASSWORD" \
--output backup-$(date +%Y%m%d-%H%M%S).sql
# Expected size: 100MB-1GB (depending on data)
# Expected time: 5-15 minutes
Automated Backup Setup
# Create backup script: provisioning/scripts/backup-database.nu
def backup_database [output_dir: string] {
let timestamp = (date now | format date %Y%m%d-%H%M%S)
let backup_file = $"($output_dir)/vapora-db-($timestamp).sql"
print $"Starting database backup to ($backup_file)..."
# Export database
kubectl exec -n vapora deployment/vapora-backend -- \
surreal export \
--conn ws://localhost:8000 \
--user root \
--pass $env.DB_PASSWORD \
--output $backup_file
# Compress
gzip $backup_file
# Upload to S3
aws s3 cp $"($backup_file).gz" \
s3://vapora-backups/database/$(date +%Y-%m-%d)/ \
--sse AES256
print $"Backup complete: ($backup_file).gz"
}
Backup Schedule
# Kubernetes CronJob for hourly backups
apiVersion: batch/v1
kind: CronJob
metadata:
name: database-backup
namespace: vapora
spec:
schedule: "0 * * * *" # Every hour
jobTemplate:
spec:
template:
spec:
containers:
- name: backup
image: vapora/backup-tools:latest
command:
- /scripts/backup-database.sh
env:
- name: DB_PASSWORD
valueFrom:
secretKeyRef:
name: db-credentials
key: password
- name: AWS_ACCESS_KEY_ID
valueFrom:
secretKeyRef:
name: aws-credentials
key: access-key
restartPolicy: OnFailure
Backup Retention Policy
Hourly backups (last 24 hours):
├── Keep: All hourly backups
├── Purpose: Granular recovery options
└── Storage: Standard (fast access)
Daily backups (last 30 days):
├── Keep: 1 per day at midnight UTC
├── Purpose: Daily recovery options
└── Storage: Standard (fast access)
Weekly backups (last 90 days):
├── Keep: 1 per Sunday at midnight UTC
├── Purpose: Medium-term recovery
└── Storage: Standard
Monthly backups (7 years):
├── Keep: 1 per month on 1st at midnight UTC
├── Purpose: Compliance & long-term recovery
└── Storage: Archive (cold storage)
Backup Verification
# Daily backup verification
def verify_backup [backup_file: string] {
print $"Verifying backup: ($backup_file)"
# 1. Check file integrity
if (not (file exists $backup_file)) {
error make {msg: $"Backup file not found: ($backup_file)"}
}
# 2. Check file size (should be > 1MB)
let size = (ls $backup_file | get 0.size)
if ($size < 1000000) {
error make {msg: $"Backup file too small: ($size) bytes"}
}
# 3. Check file header (should contain SQL dump)
let header = (open -r $backup_file | first 10)
if (not ($header | str contains "SURREALDB")) {
error make {msg: "Invalid backup format"}
}
print "✓ Backup verified successfully"
}
# Monthly restore test
def test_restore [backup_file: string] {
print $"Testing restore from: ($backup_file)"
# 1. Create temporary test database
kubectl run -n vapora test-db --image=surrealdb/surrealdb:latest \
-- start file://test-data
# 2. Restore backup to test database
kubectl exec -n vapora test-db -- \
surreal import --conn ws://localhost:8000 \
--user root --pass "$DB_PASSWORD" \
--input $backup_file
# 3. Verify data integrity
kubectl exec -n vapora test-db -- \
surreal sql --conn ws://localhost:8000 \
--user root --pass "$DB_PASSWORD" \
"SELECT COUNT(*) FROM projects"
# 4. Compare record counts
# Should match production database
# 5. Cleanup test database
kubectl delete pod -n vapora test-db
print "✓ Restore test passed"
}
Configuration Backup
ConfigMap & Secret Backups
# Backup all ConfigMaps
kubectl get configmap -n vapora -o yaml > configmaps-backup-$(date +%Y%m%d).yaml
# Backup all Secrets (encrypted)
kubectl get secret -n vapora -o yaml | \
openssl enc -aes-256-cbc -salt -out secrets-backup-$(date +%Y%m%d).yaml.enc
# Upload to S3
aws s3 sync . s3://vapora-backups/k8s-configs/$(date +%Y-%m-%d)/ \
--exclude "*" --include "*.yaml" --include "*.yaml.enc" \
--sse AES256
Automated Nushell Script
def backup_k8s_configs [output_dir: string] {
let timestamp = (date now | format date %Y%m%d)
let config_dir = $"($output_dir)/k8s-configs-($timestamp)"
mkdir $config_dir
# Backup ConfigMaps
kubectl get configmap -n vapora -o yaml > $"($config_dir)/configmaps.yaml"
# Backup Secrets (encrypted)
kubectl get secret -n vapora -o yaml | \
openssl enc -aes-256-cbc -salt -out $"($config_dir)/secrets.yaml.enc"
# Backup Deployments
kubectl get deployments -n vapora -o yaml > $"($config_dir)/deployments.yaml"
# Backup Services
kubectl get services -n vapora -o yaml > $"($config_dir)/services.yaml"
# Backup all to archive
tar -czf $"($config_dir).tar.gz" $config_dir
# Upload
aws s3 cp $"($config_dir).tar.gz" \
s3://vapora-backups/configs/ \
--sse AES256
print "✓ K8s configs backed up"
}
Infrastructure-as-Code Backups
Git Repository Backups
Primary: GitHub (with backup organization)
# Mirror repository to backup location
git clone --mirror https://github.com/your-org/vapora.git \
vapora-mirror.git
# Push to backup location
cd vapora-mirror.git
git push --mirror https://backup-git-server/vapora-mirror.git
Backup Schedule
# Daily mirror push
*/6 * * * * /scripts/backup-git-repo.sh
Provisioning Code Backups
# Backup Nickel configs & scripts
def backup_provisioning_code [output_dir: string] {
let timestamp = (date now | format date %Y%m%d)
# Create backup
tar -czf $"($output_dir)/provisioning-($timestamp).tar.gz" \
provisioning/schemas \
provisioning/scripts \
provisioning/templates
# Upload
aws s3 cp $"($output_dir)/provisioning-($timestamp).tar.gz" \
s3://vapora-backups/provisioning/ \
--sse AES256
}
Application State Backups
Persistent Volume Backups
If using persistent volumes for data:
# Backup PersistentVolumeClaims
def backup_pvcs [namespace: string] {
let pvcs = (kubectl get pvc -n $namespace -o json | from json).items
for pvc in $pvcs {
let pvc_name = $pvc.metadata.name
let volume_size = $pvc.spec.resources.requests.storage
print $"Backing up PVC: ($pvc_name) (($volume_size))"
# Create snapshot (cloud-specific)
aws ec2 create-snapshot \
--volume-id $pvc_name \
--description $"VAPORA backup $(date +%Y-%m-%d)"
}
}
Application Logs
# Export logs for archive
def backup_application_logs [output_dir: string] {
let timestamp = (date now | format date %Y%m%d)
# Export last 7 days of logs
kubectl logs deployment/vapora-backend -n vapora \
--since=168h > $"($output_dir)/backend-logs-($timestamp).log"
kubectl logs deployment/vapora-agents -n vapora \
--since=168h > $"($output_dir)/agents-logs-($timestamp).log"
# Compress and upload
gzip $"($output_dir)/*.log"
aws s3 sync $output_dir s3://vapora-backups/logs/ \
--exclude "*" --include "*.log.gz" \
--sse AES256
}
Container Image Backups
Docker Image Registry
# Tag images for backup
docker tag vapora/backend:latest vapora/backend:backup-$(date +%Y%m%d)
docker tag vapora/agents:latest vapora/agents:backup-$(date +%Y%m%d)
docker tag vapora/llm-router:latest vapora/llm-router:backup-$(date +%Y%m%d)
# Push to backup registry
docker push backup-registry/vapora/backend:backup-$(date +%Y%m%d)
docker push backup-registry/vapora/agents:backup-$(date +%Y%m%d)
docker push backup-registry/vapora/llm-router:backup-$(date +%Y%m%d)
# Retention: Keep last 30 days of images
Backup Monitoring
Backup Health Checks
# Daily backup status check
def check_backup_status [] {
print "=== Backup Status Report ==="
# 1. Check latest database backup
let latest_db = (aws s3 ls s3://vapora-backups/database/ \
--recursive | tail -1)
let db_age = (date now) - ($latest_db | from json | get LastModified)
if ($db_age > 2h) {
print "⚠️ Database backup stale (> 2 hours old)"
} else {
print "✓ Database backup current"
}
# 2. Check config backup
let config_count = (aws s3 ls s3://vapora-backups/configs/ | wc -l)
if ($config_count > 0) {
print "✓ Config backups present"
} else {
print "❌ No config backups found"
}
# 3. Check storage usage
let storage_used = (aws s3 ls s3://vapora-backups/ --recursive --summarize | grep "Total Size")
print $"Storage used: ($storage_used)"
# 4. Check backup encryption
let objects = (aws s3api list-objects-v2 --bucket vapora-backups --query 'Contents[*]')
# All should have ServerSideEncryption: AES256
print "=== End Report ==="
}
Backup Alerts
Configure alerts for:
Backup Failures:
- Threshold: Backup not completed in 2 hours
- Action: Alert operations team
- Severity: High
Backup Staleness:
- Threshold: Latest backup > 24 hours old
- Action: Alert operations team
- Severity: High
Storage Capacity:
- Threshold: Backup storage > 80% full
- Action: Alert & plan cleanup
- Severity: Medium
Restore Test Failures:
- Threshold: Monthly restore test fails
- Action: Alert & investigate
- Severity: Critical
Backup Testing & Validation
Monthly Restore Test
Schedule: First Sunday of each month at 02:00 UTC
def monthly_restore_test [] {
print "Starting monthly restore test..."
# 1. Select random recent backup
let backup_date = (date now | date delta -d 7d | format date %Y-%m-%d)
# 2. Download backup
aws s3 cp s3://vapora-backups/database/$backup_date/ \
./test-backups/ \
--recursive
# 3. Restore to test environment
# (See Database Recovery Procedures)
# 4. Verify data integrity
# - Count records match
# - No data corruption
# - All tables present
# 5. Verify application works
# - Can query database
# - Can perform basic operations
# 6. Document results
# - Success/failure
# - Any issues found
# - Time taken
print "✓ Restore test completed"
}
Backup Audit Report
Quarterly: Generate backup audit report
def quarterly_backup_audit [] {
print "=== Quarterly Backup Audit Report ==="
print $"Report Date: (date now | format date %Y-%m-%d)"
print ""
print "1. Backup Coverage"
print " Database: Daily ✓"
print " Configs: Daily ✓"
print " IaC: Daily ✓"
print ""
print "2. Restore Tests (Last Quarter)"
print " Tests Performed: 3"
print " Tests Passed: 3"
print " Average Restore Time: 2.5 hours"
print ""
print "3. Storage Usage"
# Calculate storage per category
print "4. Backup Age Distribution"
# Show age distribution of backups
print "5. Incidents & Issues"
# Any backup-related incidents
print "6. Recommendations"
# Any needed improvements
}
Backup Security
Encryption
- ✅ All backups encrypted at rest (AES-256)
- ✅ All backups encrypted in transit (HTTPS/TLS)
- ✅ Encryption keys managed by cloud provider or KMS
- ✅ Separate keys for database and config backups
Access Control
Backup Access Policy:
Read Access:
- Operations team
- Disaster recovery team
- Compliance/audit team
Write Access:
- Automated backup system only
- Require 2FA for manual backups
Delete/Modify Access:
- Require 2 approvals
- Audit logging enabled
- 24-hour delay before deletion
Audit Logging
# All backup operations logged
- Backup creation: When, size, hash
- Backup retrieval: Who, when, what
- Restore operations: When, who, from where
- Backup deletion: When, who, reason
# Logs stored separately and immutable
# Example: CloudTrail, S3 access logs, custom logging
Backup Disaster Scenarios
Scenario 1: Single Database Backup Fails
Impact: 1-hour data loss risk
Prevention:
- Backup redundancy (multiple copies)
- Multiple backup methods
- Backup validation after each backup
Recovery:
- Use previous hour's backup
- Restore to test environment first
- Validate data integrity
- Restore to production if good
Scenario 2: Backup Storage Compromised
Impact: Data loss + security breach
Prevention:
- Encryption with separate keys
- Geographic redundancy
- Backup verification signing
- Access control restrictions
Recovery:
- Activate secondary backup location
- Restore from archive backups
- Full security audit
Scenario 3: Ransomware Infection
Impact: All recent backups encrypted
Prevention:
- Immutable backups (WORM)
- Air-gapped backups (offline)
- Archive-only old backups
- Regular backup verification
Recovery:
- Use air-gapped backup
- Restore to clean environment
- Full security remediation
Scenario 4: Accidental Data Deletion
Impact: Data loss from point of deletion
Prevention:
- Frequent backups (hourly)
- Soft deletes in application
- Audit logging
Recovery:
- Restore from backup before deletion time
- Point-in-time recovery if available
Backup Checklists
Daily
- Database backup completed
- Backup size normal (not 0 bytes)
- No backup errors in logs
- Upload to S3 succeeded
- Previous backup still available
Weekly
- Database backup retention verified
- Config backup completed
- Infrastructure code backed up
- Backup storage space adequate
- Encryption keys accessible
Monthly
- Restore test scheduled
- Backup audit report generated
- Backup verification successful
- Archive backups created
- Old backups properly retained
Quarterly
- Full audit report completed
- Backup strategy reviewed
- Team trained on procedures
- RTO/RPO targets met
- Recommendations implemented
Summary
Backup Strategy at a Glance:
| Item | Frequency | Retention | Storage | Encryption |
|---|---|---|---|---|
| Database | Hourly | 30 days | S3 | AES-256 |
| Config | Daily | 90 days | S3 | AES-256 |
| IaC | Daily | 30 days | Git + S3 | AES-256 |
| Images | Daily | 30 days | Registry | Built-in |
| Archive | Monthly | 7 years | Glacier | AES-256 |
Key Metrics:
- RPO: 1 hour (lose at most 1 hour of data)
- RTO: 4 hours (restore within 4 hours)
- Availability: 99.9% (backups available when needed)
- Validation: 100% (all backups tested monthly)
Success Criteria:
- ✅ Daily backup completion
- ✅ Backup validation passes
- ✅ Monthly restore test successful
- ✅ No security incidents
- ✅ Compliance requirements met