# VAPORA Automated Backup & Recovery Automation Automated backup and recovery procedures using Nushell scripts and Kubernetes CronJobs. Supports both direct S3 backups and Restic-based incremental backups. --- ## Overview **Backup Strategy**: - Hourly: Database export + Restic backup (1-hour RPO) - Daily: Kubernetes config backup + Restic backup - Monthly: Cleanup old snapshots and archive **Dual Backup Approach**: - **S3 Direct**: Simple file upload for quick recovery - **Restic**: Incremental, deduplicated backups with integrated encryption **Recovery Procedures**: - One-command restore from S3 or Restic - Verification before committing to production - Automated database readiness checks --- ## Files and Components ### Backup Scripts All scripts follow NUSHELL_GUIDELINES.md (0.109.0+) strictly. #### `scripts/backup/database-backup.nu` Direct S3 backup of SurrealDB with encryption. ```bash nu scripts/backup/database-backup.nu \ --surreal-url "ws://localhost:8000" \ --surreal-user "root" \ --surreal-pass "$SURREAL_PASS" \ --s3-bucket "vapora-backups" \ --s3-prefix "backups/database" \ --encryption-key "$ENCRYPTION_KEY_FILE" ``` **Process**: 1. Export SurrealDB to SQL 2. Compress with gzip 3. Encrypt with AES-256 4. Upload to S3 with metadata 5. Verify upload completed **Output**: `s3://vapora-backups/backups/database/database-YYYYMMDD-HHMMSS.sql.gz.enc` #### `scripts/backup/config-backup.nu` Backup Kubernetes resources (ConfigMaps, Secrets, Deployments). ```bash nu scripts/backup/config-backup.nu \ --namespace "vapora" \ --s3-bucket "vapora-backups" \ --s3-prefix "backups/config" ``` **Process**: 1. Export ConfigMaps from namespace 2. Export Secrets 3. Export Deployments, Services, Ingress 4. Compress all to tar.gz 5. Upload to S3 **Output**: `s3://vapora-backups/backups/config/configs-YYYYMMDD-HHMMSS.tar.gz` #### `scripts/backup/restic-backup.nu` Incremental, deduplicated backup using Restic. ```bash nu scripts/backup/restic-backup.nu \ --repo "s3:s3.amazonaws.com/vapora-backups/restic" \ --password "$RESTIC_PASSWORD" \ --database-dir "/tmp/vapora-db-backup" \ --k8s-dir "/tmp/vapora-k8s-backup" \ --iac-dir "provisioning" \ --backup-db \ --backup-k8s \ --backup-iac \ --verify \ --cleanup \ --keep-daily 7 \ --keep-weekly 4 \ --keep-monthly 12 ``` **Features**: - Incremental backups (only changed data stored) - Deduplication across snapshots - Built-in compression and encryption - Automatic retention policies - Repository health verification **Output**: Tagged snapshots in Restic repository with metadata #### `scripts/orchestrate-backup-recovery.nu` Coordinates all backup types (S3 + Restic). ```bash # Full backup cycle nu scripts/orchestrate-backup-recovery.nu \ --operation backup \ --mode full \ --surreal-url "ws://localhost:8000" \ --surreal-user "root" \ --surreal-pass "$SURREAL_PASS" \ --namespace "vapora" \ --s3-bucket "vapora-backups" \ --s3-prefix "backups/database" \ --encryption-key "$ENCRYPTION_KEY_FILE" \ --restic-repo "s3:s3.amazonaws.com/vapora-backups/restic" \ --restic-password "$RESTIC_PASSWORD" \ --iac-dir "provisioning" ``` **Modes**: - `full`: Database export → S3 + Restic - `database-only`: Database export only - `config-only`: Kubernetes config only ### Recovery Scripts #### `scripts/recovery/database-recovery.nu` Restore SurrealDB from S3 backup (with decryption). ```bash nu scripts/recovery/database-recovery.nu \ --s3-location "s3://vapora-backups/backups/database/database-20260112-010000.sql.gz.enc" \ --encryption-key "$ENCRYPTION_KEY_FILE" \ --surreal-url "ws://localhost:8000" \ --surreal-user "root" \ --surreal-pass "$SURREAL_PASS" \ --namespace "vapora" \ --statefulset "surrealdb" \ --pvc "surrealdb-data-surrealdb-0" \ --verify ``` **Process**: 1. Download encrypted backup from S3 2. Decrypt backup file 3. Decompress backup 4. Scale down StatefulSet (for PVC replacement) 5. Delete current PVC 6. Scale up StatefulSet (creates new PVC) 7. Wait for pod readiness 8. Import backup to database 9. Verify data integrity **Output**: Restored database at specified SurrealDB URL #### `scripts/orchestrate-backup-recovery.nu` (Recovery Mode) One-command recovery from backup. ```bash nu scripts/orchestrate-backup-recovery.nu \ --operation recovery \ --s3-location "s3://vapora-backups/backups/database/database-20260112-010000.sql.gz.enc" \ --encryption-key "$ENCRYPTION_KEY_FILE" \ --surreal-url "ws://localhost:8000" \ --surreal-user "root" \ --surreal-pass "$SURREAL_PASS" ``` ### Verification Scripts #### `scripts/verify-backup-health.nu` Health check for backup infrastructure. ```bash # Basic health check nu scripts/verify-backup-health.nu \ --s3-bucket "vapora-backups" \ --s3-prefix "backups/database" \ --restic-repo "s3:s3.amazonaws.com/vapora-backups/restic" \ --restic-password "$RESTIC_PASSWORD" \ --surreal-url "ws://localhost:8000" \ --surreal-user "root" \ --surreal-pass "$SURREAL_PASS" \ --max-age-hours 25 ``` **Checks Performed**: - ✓ S3 backups exist and have content - ✓ Restic repository accessible and has snapshots - ✓ Database connectivity verified - ✓ Backup freshness (< 25 hours old) - ✓ Backup rotation policy (daily, weekly, monthly) - ✓ Restore test (if `--full-test` specified) **Output**: Pass/fail for each check with detailed status --- ## Kubernetes Automation ### CronJob Configuration File: `kubernetes/09-backup-cronjobs.yaml` Defines four automated CronJobs: #### 1. Hourly Database Backup ```yaml schedule: "0 * * * *" # Every hour timeout: 1800 seconds # 30 minutes ``` Runs `orchestrate-backup-recovery.nu --operation backup --mode full` **Backups**: - SurrealDB to S3 (encrypted) - SurrealDB to Restic (incremental) - IaC to Restic #### 2. Daily Configuration Backup ```yaml schedule: "0 2 * * *" # 02:00 UTC daily timeout: 3600 seconds # 60 minutes ``` Runs `config-backup.nu` for Kubernetes resources. #### 3. Daily Health Verification ```yaml schedule: "0 3 * * *" # 03:00 UTC daily timeout: 900 seconds # 15 minutes ``` Runs `verify-backup-health.nu` to verify backup infrastructure. **Alerts if**: - No S3 backups found - Restic repository inaccessible - Database unreachable - Backups older than 25 hours - Rotation policy violated #### 4. Monthly Backup Rotation ```yaml schedule: "0 4 1 * *" # First day of month, 04:00 UTC timeout: 3600 seconds ``` Cleans up old Restic snapshots per retention policy: - Keep: 7 daily, 4 weekly, 12 monthly - Prune: Remove unreferenced data ### Environment Configuration CronJobs require these secrets and ConfigMaps: **ConfigMap: `vapora-config`** ```yaml backup_s3_bucket: "vapora-backups" restic_repo: "s3:s3.amazonaws.com/vapora-backups/restic" aws_region: "us-east-1" ``` **Secret: `vapora-secrets`** ```yaml surreal_password: "" restic_password: "" ``` **Secret: `vapora-aws-credentials`** ```yaml access_key_id: "" secret_access_key: "" ``` **Secret: `vapora-encryption-key`** ```yaml # File containing AES-256 encryption key encryption.key: "" ``` ### Deployment 1. **Create secrets** (if not existing): ```bash kubectl create secret generic vapora-secrets \ --from-literal=surreal_password="$SURREAL_PASS" \ --from-literal=restic_password="$RESTIC_PASSWORD" \ -n vapora kubectl create secret generic vapora-aws-credentials \ --from-literal=access_key_id="$AWS_ACCESS_KEY_ID" \ --from-literal=secret_access_key="$AWS_SECRET_ACCESS_KEY" \ -n vapora kubectl create secret generic vapora-encryption-key \ --from-file=encryption.key=/path/to/encryption.key \ -n vapora ``` 2. **Deploy CronJobs**: ```bash kubectl apply -f kubernetes/09-backup-cronjobs.yaml ``` 3. **Verify CronJobs**: ```bash kubectl get cronjobs -n vapora kubectl describe cronjob vapora-backup-database-hourly -n vapora ``` 4. **Monitor scheduled runs**: ```bash # Watch CronJob executions kubectl get jobs -n vapora -l job-type=backup --watch # View logs from backup job kubectl logs -n vapora -l backup-type=database --tail=100 -f ``` --- ## Setup Instructions ### Prerequisites - Kubernetes 1.18+ with CronJob support - Nushell 0.109.0+ - AWS CLI v2+ - Restic installed (or container image with restic) - SurrealDB CLI (`surreal` command) - `kubectl` with cluster access ### Local Testing 1. **Setup environment variables**: ```bash export SURREAL_URL="ws://localhost:8000" export SURREAL_USER="root" export SURREAL_PASS="password" export S3_BUCKET="vapora-backups" export ENCRYPTION_KEY_FILE="/path/to/encryption.key" export RESTIC_REPO="s3:s3.amazonaws.com/vapora-backups/restic" export RESTIC_PASSWORD="restic-password" export AWS_REGION="us-east-1" export AWS_ACCESS_KEY_ID="your-key" export AWS_SECRET_ACCESS_KEY="your-secret" ``` 2. **Run backup**: ```bash nu scripts/orchestrate-backup-recovery.nu \ --operation backup \ --mode full \ --surreal-url "$SURREAL_URL" \ --surreal-user "$SURREAL_USER" \ --surreal-pass "$SURREAL_PASS" \ --s3-bucket "$S3_BUCKET" \ --s3-prefix "backups/database" \ --encryption-key "$ENCRYPTION_KEY_FILE" \ --restic-repo "$RESTIC_REPO" \ --restic-password "$RESTIC_PASSWORD" \ --iac-dir "provisioning" ``` 3. **Verify backup**: ```bash nu scripts/verify-backup-health.nu \ --s3-bucket "$S3_BUCKET" \ --s3-prefix "backups/database" \ --restic-repo "$RESTIC_REPO" \ --restic-password "$RESTIC_PASSWORD" \ --surreal-url "$SURREAL_URL" \ --surreal-user "$SURREAL_USER" \ --surreal-pass "$SURREAL_PASS" ``` 4. **Test recovery**: ```bash # First, list available backups aws s3 ls s3://$S3_BUCKET/backups/database/ # Then recover from latest backup nu scripts/orchestrate-backup-recovery.nu \ --operation recovery \ --s3-location "s3://$S3_BUCKET/backups/database/database-20260112-010000.sql.gz.enc" \ --encryption-key "$ENCRYPTION_KEY_FILE" \ --surreal-url "$SURREAL_URL" \ --surreal-user "$SURREAL_USER" \ --surreal-pass "$SURREAL_PASS" ``` ### Production Deployment 1. **Create S3 bucket** for backups: ```bash aws s3 mb s3://vapora-backups --region us-east-1 ``` 2. **Enable bucket versioning** for protection: ```bash aws s3api put-bucket-versioning \ --bucket vapora-backups \ --versioning-configuration Status=Enabled ``` 3. **Set lifecycle policy** for Glacier archival (optional): ```bash # 30 days to standard-IA, 90 days to Glacier aws s3api put-bucket-lifecycle-configuration \ --bucket vapora-backups \ --lifecycle-configuration file://s3-lifecycle-policy.json ``` 4. **Create Restic repository**: ```bash export RESTIC_REPO="s3:s3.amazonaws.com/vapora-backups/restic" export RESTIC_PASSWORD="your-restic-password" restic init ``` 5. **Deploy to Kubernetes**: ```bash # 1. Create namespace kubectl create namespace vapora # 2. Create secrets kubectl create secret generic vapora-secrets \ --from-literal=surreal_password="$SURREAL_PASS" \ --from-literal=restic_password="$RESTIC_PASSWORD" \ -n vapora # 3. Create ConfigMap kubectl create configmap vapora-config \ --from-literal=backup_s3_bucket="vapora-backups" \ --from-literal=restic_repo="s3:s3.amazonaws.com/vapora-backups/restic" \ --from-literal=aws_region="us-east-1" \ -n vapora # 4. Deploy CronJobs kubectl apply -f kubernetes/09-backup-cronjobs.yaml ``` 6. **Monitor**: ```bash # Watch CronJobs kubectl get cronjobs -n vapora --watch # View backup logs kubectl logs -n vapora -l backup-type=database -f # Check health status kubectl get jobs -n vapora -l job-type=health-check -o wide ``` --- ## Emergency Recovery ### Complete Database Loss If production database is lost, restore from backup: ```bash # 1. Scale down StatefulSet kubectl scale statefulset surrealdb --replicas=0 -n vapora # 2. Delete current PVC kubectl delete pvc surrealdb-data-surrealdb-0 -n vapora # 3. Run recovery nu scripts/orchestrate-backup-recovery.nu \ --operation recovery \ --s3-location "s3://vapora-backups/backups/database/database-LATEST.sql.gz.enc" \ --encryption-key "/path/to/encryption.key" \ --surreal-url "ws://surrealdb:8000" \ --surreal-user "root" \ --surreal-pass "$SURREAL_PASS" # 4. Verify database restored kubectl exec -n vapora surrealdb-0 -- \ surreal query \ --conn ws://localhost:8000 \ --user root \ --pass "$SURREAL_PASS" \ "SELECT COUNT() FROM projects" ``` ### Backup Verification Failed If health check fails: 1. **Check Restic repository**: ```bash export RESTIC_PASSWORD="$RESTIC_PASSWORD" restic -r "s3:s3.amazonaws.com/vapora-backups/restic" check ``` 2. **Force full verification** (slow): ```bash restic -r "s3:s3.amazonaws.com/vapora-backups/restic" check --read-data ``` 3. **List recent snapshots**: ```bash restic -r "s3:s3.amazonaws.com/vapora-backups/restic" snapshots --max 10 ``` --- ## Troubleshooting | Issue | Cause | Solution | |-------|-------|----------| | **CronJob not running** | Schedule incorrect | Check `kubectl get cronjobs` and verify schedule format | | **Backup file too large** | Database growing | Check for old data that can be cleaned up | | **S3 upload fails** | Credentials invalid | Verify `AWS_ACCESS_KEY_ID`, `AWS_SECRET_ACCESS_KEY` | | **Restic backup slow** | First backup or network latency | Expected on first run; use `--keep-*` flags to limit retention | | **Recovery fails** | Database already running | Scale down StatefulSet before recovery | | **Encryption key missing** | Secret not created | Create `vapora-encryption-key` secret in namespace | --- ## Related Documentation - **Disaster Recovery Procedures**: `docs/disaster-recovery/README.md` - **Backup Strategy**: `docs/disaster-recovery/backup-strategy.md` - **Database Recovery**: `docs/disaster-recovery/database-recovery-procedures.md` - **Operations Guide**: `docs/operations/README.md` --- **Last Updated**: January 12, 2026 **Status**: Production-Ready **Automation**: Full CronJob automation with health checks