Vapora/docs/operations/backup-recovery-automation.md
Jesús Pérez 7110ffeea2
Some checks failed
Rust CI / Security Audit (push) Has been cancelled
Rust CI / Check + Test + Lint (nightly) (push) Has been cancelled
Rust CI / Check + Test + Lint (stable) (push) Has been cancelled
chore: extend doc: adr, tutorials, operations, etc
2026-01-12 03:32:47 +00:00

14 KiB

VAPORA Automated Backup & Recovery Automation

Automated backup and recovery procedures using Nushell scripts and Kubernetes CronJobs. Supports both direct S3 backups and Restic-based incremental backups.


Overview

Backup Strategy:

  • Hourly: Database export + Restic backup (1-hour RPO)
  • Daily: Kubernetes config backup + Restic backup
  • Monthly: Cleanup old snapshots and archive

Dual Backup Approach:

  • S3 Direct: Simple file upload for quick recovery
  • Restic: Incremental, deduplicated backups with integrated encryption

Recovery Procedures:

  • One-command restore from S3 or Restic
  • Verification before committing to production
  • Automated database readiness checks

Files and Components

Backup Scripts

All scripts follow NUSHELL_GUIDELINES.md (0.109.0+) strictly.

scripts/backup/database-backup.nu

Direct S3 backup of SurrealDB with encryption.

nu scripts/backup/database-backup.nu \
  --surreal-url "ws://localhost:8000" \
  --surreal-user "root" \
  --surreal-pass "$SURREAL_PASS" \
  --s3-bucket "vapora-backups" \
  --s3-prefix "backups/database" \
  --encryption-key "$ENCRYPTION_KEY_FILE"

Process:

  1. Export SurrealDB to SQL
  2. Compress with gzip
  3. Encrypt with AES-256
  4. Upload to S3 with metadata
  5. Verify upload completed

Output: s3://vapora-backups/backups/database/database-YYYYMMDD-HHMMSS.sql.gz.enc

scripts/backup/config-backup.nu

Backup Kubernetes resources (ConfigMaps, Secrets, Deployments).

nu scripts/backup/config-backup.nu \
  --namespace "vapora" \
  --s3-bucket "vapora-backups" \
  --s3-prefix "backups/config"

Process:

  1. Export ConfigMaps from namespace
  2. Export Secrets
  3. Export Deployments, Services, Ingress
  4. Compress all to tar.gz
  5. Upload to S3

Output: s3://vapora-backups/backups/config/configs-YYYYMMDD-HHMMSS.tar.gz

scripts/backup/restic-backup.nu

Incremental, deduplicated backup using Restic.

nu scripts/backup/restic-backup.nu \
  --repo "s3:s3.amazonaws.com/vapora-backups/restic" \
  --password "$RESTIC_PASSWORD" \
  --database-dir "/tmp/vapora-db-backup" \
  --k8s-dir "/tmp/vapora-k8s-backup" \
  --iac-dir "provisioning" \
  --backup-db \
  --backup-k8s \
  --backup-iac \
  --verify \
  --cleanup \
  --keep-daily 7 \
  --keep-weekly 4 \
  --keep-monthly 12

Features:

  • Incremental backups (only changed data stored)
  • Deduplication across snapshots
  • Built-in compression and encryption
  • Automatic retention policies
  • Repository health verification

Output: Tagged snapshots in Restic repository with metadata

scripts/orchestrate-backup-recovery.nu

Coordinates all backup types (S3 + Restic).

# Full backup cycle
nu scripts/orchestrate-backup-recovery.nu \
  --operation backup \
  --mode full \
  --surreal-url "ws://localhost:8000" \
  --surreal-user "root" \
  --surreal-pass "$SURREAL_PASS" \
  --namespace "vapora" \
  --s3-bucket "vapora-backups" \
  --s3-prefix "backups/database" \
  --encryption-key "$ENCRYPTION_KEY_FILE" \
  --restic-repo "s3:s3.amazonaws.com/vapora-backups/restic" \
  --restic-password "$RESTIC_PASSWORD" \
  --iac-dir "provisioning"

Modes:

  • full: Database export → S3 + Restic
  • database-only: Database export only
  • config-only: Kubernetes config only

Recovery Scripts

scripts/recovery/database-recovery.nu

Restore SurrealDB from S3 backup (with decryption).

nu scripts/recovery/database-recovery.nu \
  --s3-location "s3://vapora-backups/backups/database/database-20260112-010000.sql.gz.enc" \
  --encryption-key "$ENCRYPTION_KEY_FILE" \
  --surreal-url "ws://localhost:8000" \
  --surreal-user "root" \
  --surreal-pass "$SURREAL_PASS" \
  --namespace "vapora" \
  --statefulset "surrealdb" \
  --pvc "surrealdb-data-surrealdb-0" \
  --verify

Process:

  1. Download encrypted backup from S3
  2. Decrypt backup file
  3. Decompress backup
  4. Scale down StatefulSet (for PVC replacement)
  5. Delete current PVC
  6. Scale up StatefulSet (creates new PVC)
  7. Wait for pod readiness
  8. Import backup to database
  9. Verify data integrity

Output: Restored database at specified SurrealDB URL

scripts/orchestrate-backup-recovery.nu (Recovery Mode)

One-command recovery from backup.

nu scripts/orchestrate-backup-recovery.nu \
  --operation recovery \
  --s3-location "s3://vapora-backups/backups/database/database-20260112-010000.sql.gz.enc" \
  --encryption-key "$ENCRYPTION_KEY_FILE" \
  --surreal-url "ws://localhost:8000" \
  --surreal-user "root" \
  --surreal-pass "$SURREAL_PASS"

Verification Scripts

scripts/verify-backup-health.nu

Health check for backup infrastructure.

# Basic health check
nu scripts/verify-backup-health.nu \
  --s3-bucket "vapora-backups" \
  --s3-prefix "backups/database" \
  --restic-repo "s3:s3.amazonaws.com/vapora-backups/restic" \
  --restic-password "$RESTIC_PASSWORD" \
  --surreal-url "ws://localhost:8000" \
  --surreal-user "root" \
  --surreal-pass "$SURREAL_PASS" \
  --max-age-hours 25

Checks Performed:

  • ✓ S3 backups exist and have content
  • ✓ Restic repository accessible and has snapshots
  • ✓ Database connectivity verified
  • ✓ Backup freshness (< 25 hours old)
  • ✓ Backup rotation policy (daily, weekly, monthly)
  • ✓ Restore test (if --full-test specified)

Output: Pass/fail for each check with detailed status


Kubernetes Automation

CronJob Configuration

File: kubernetes/09-backup-cronjobs.yaml

Defines four automated CronJobs:

1. Hourly Database Backup

schedule: "0 * * * *"  # Every hour
timeout: 1800 seconds  # 30 minutes

Runs orchestrate-backup-recovery.nu --operation backup --mode full

Backups:

  • SurrealDB to S3 (encrypted)
  • SurrealDB to Restic (incremental)
  • IaC to Restic

2. Daily Configuration Backup

schedule: "0 2 * * *"  # 02:00 UTC daily
timeout: 3600 seconds  # 60 minutes

Runs config-backup.nu for Kubernetes resources.

3. Daily Health Verification

schedule: "0 3 * * *"  # 03:00 UTC daily
timeout: 900 seconds   # 15 minutes

Runs verify-backup-health.nu to verify backup infrastructure.

Alerts if:

  • No S3 backups found
  • Restic repository inaccessible
  • Database unreachable
  • Backups older than 25 hours
  • Rotation policy violated

4. Monthly Backup Rotation

schedule: "0 4 1 * *"  # First day of month, 04:00 UTC
timeout: 3600 seconds

Cleans up old Restic snapshots per retention policy:

  • Keep: 7 daily, 4 weekly, 12 monthly
  • Prune: Remove unreferenced data

Environment Configuration

CronJobs require these secrets and ConfigMaps:

ConfigMap: vapora-config

backup_s3_bucket: "vapora-backups"
restic_repo: "s3:s3.amazonaws.com/vapora-backups/restic"
aws_region: "us-east-1"

Secret: vapora-secrets

surreal_password: "<database-password>"
restic_password: "<restic-encryption-password>"

Secret: vapora-aws-credentials

access_key_id: "<aws-access-key>"
secret_access_key: "<aws-secret-key>"

Secret: vapora-encryption-key

# File containing AES-256 encryption key
encryption.key: "<binary-key-data>"

Deployment

  1. Create secrets (if not existing):
kubectl create secret generic vapora-secrets \
  --from-literal=surreal_password="$SURREAL_PASS" \
  --from-literal=restic_password="$RESTIC_PASSWORD" \
  -n vapora

kubectl create secret generic vapora-aws-credentials \
  --from-literal=access_key_id="$AWS_ACCESS_KEY_ID" \
  --from-literal=secret_access_key="$AWS_SECRET_ACCESS_KEY" \
  -n vapora

kubectl create secret generic vapora-encryption-key \
  --from-file=encryption.key=/path/to/encryption.key \
  -n vapora
  1. Deploy CronJobs:
kubectl apply -f kubernetes/09-backup-cronjobs.yaml
  1. Verify CronJobs:
kubectl get cronjobs -n vapora
kubectl describe cronjob vapora-backup-database-hourly -n vapora
  1. Monitor scheduled runs:
# Watch CronJob executions
kubectl get jobs -n vapora -l job-type=backup --watch

# View logs from backup job
kubectl logs -n vapora -l backup-type=database --tail=100 -f

Setup Instructions

Prerequisites

  • Kubernetes 1.18+ with CronJob support
  • Nushell 0.109.0+
  • AWS CLI v2+
  • Restic installed (or container image with restic)
  • SurrealDB CLI (surreal command)
  • kubectl with cluster access

Local Testing

  1. Setup environment variables:
export SURREAL_URL="ws://localhost:8000"
export SURREAL_USER="root"
export SURREAL_PASS="password"
export S3_BUCKET="vapora-backups"
export ENCRYPTION_KEY_FILE="/path/to/encryption.key"
export RESTIC_REPO="s3:s3.amazonaws.com/vapora-backups/restic"
export RESTIC_PASSWORD="restic-password"
export AWS_REGION="us-east-1"
export AWS_ACCESS_KEY_ID="your-key"
export AWS_SECRET_ACCESS_KEY="your-secret"
  1. Run backup:
nu scripts/orchestrate-backup-recovery.nu \
  --operation backup \
  --mode full \
  --surreal-url "$SURREAL_URL" \
  --surreal-user "$SURREAL_USER" \
  --surreal-pass "$SURREAL_PASS" \
  --s3-bucket "$S3_BUCKET" \
  --s3-prefix "backups/database" \
  --encryption-key "$ENCRYPTION_KEY_FILE" \
  --restic-repo "$RESTIC_REPO" \
  --restic-password "$RESTIC_PASSWORD" \
  --iac-dir "provisioning"
  1. Verify backup:
nu scripts/verify-backup-health.nu \
  --s3-bucket "$S3_BUCKET" \
  --s3-prefix "backups/database" \
  --restic-repo "$RESTIC_REPO" \
  --restic-password "$RESTIC_PASSWORD" \
  --surreal-url "$SURREAL_URL" \
  --surreal-user "$SURREAL_USER" \
  --surreal-pass "$SURREAL_PASS"
  1. Test recovery:
# First, list available backups
aws s3 ls s3://$S3_BUCKET/backups/database/

# Then recover from latest backup
nu scripts/orchestrate-backup-recovery.nu \
  --operation recovery \
  --s3-location "s3://$S3_BUCKET/backups/database/database-20260112-010000.sql.gz.enc" \
  --encryption-key "$ENCRYPTION_KEY_FILE" \
  --surreal-url "$SURREAL_URL" \
  --surreal-user "$SURREAL_USER" \
  --surreal-pass "$SURREAL_PASS"

Production Deployment

  1. Create S3 bucket for backups:
aws s3 mb s3://vapora-backups --region us-east-1
  1. Enable bucket versioning for protection:
aws s3api put-bucket-versioning \
  --bucket vapora-backups \
  --versioning-configuration Status=Enabled
  1. Set lifecycle policy for Glacier archival (optional):
# 30 days to standard-IA, 90 days to Glacier
aws s3api put-bucket-lifecycle-configuration \
  --bucket vapora-backups \
  --lifecycle-configuration file://s3-lifecycle-policy.json
  1. Create Restic repository:
export RESTIC_REPO="s3:s3.amazonaws.com/vapora-backups/restic"
export RESTIC_PASSWORD="your-restic-password"

restic init
  1. Deploy to Kubernetes:
# 1. Create namespace
kubectl create namespace vapora

# 2. Create secrets
kubectl create secret generic vapora-secrets \
  --from-literal=surreal_password="$SURREAL_PASS" \
  --from-literal=restic_password="$RESTIC_PASSWORD" \
  -n vapora

# 3. Create ConfigMap
kubectl create configmap vapora-config \
  --from-literal=backup_s3_bucket="vapora-backups" \
  --from-literal=restic_repo="s3:s3.amazonaws.com/vapora-backups/restic" \
  --from-literal=aws_region="us-east-1" \
  -n vapora

# 4. Deploy CronJobs
kubectl apply -f kubernetes/09-backup-cronjobs.yaml
  1. Monitor:
# Watch CronJobs
kubectl get cronjobs -n vapora --watch

# View backup logs
kubectl logs -n vapora -l backup-type=database -f

# Check health status
kubectl get jobs -n vapora -l job-type=health-check -o wide

Emergency Recovery

Complete Database Loss

If production database is lost, restore from backup:

# 1. Scale down StatefulSet
kubectl scale statefulset surrealdb --replicas=0 -n vapora

# 2. Delete current PVC
kubectl delete pvc surrealdb-data-surrealdb-0 -n vapora

# 3. Run recovery
nu scripts/orchestrate-backup-recovery.nu \
  --operation recovery \
  --s3-location "s3://vapora-backups/backups/database/database-LATEST.sql.gz.enc" \
  --encryption-key "/path/to/encryption.key" \
  --surreal-url "ws://surrealdb:8000" \
  --surreal-user "root" \
  --surreal-pass "$SURREAL_PASS"

# 4. Verify database restored
kubectl exec -n vapora surrealdb-0 -- \
  surreal query \
    --conn ws://localhost:8000 \
    --user root \
    --pass "$SURREAL_PASS" \
    "SELECT COUNT() FROM projects"

Backup Verification Failed

If health check fails:

  1. Check Restic repository:
export RESTIC_PASSWORD="$RESTIC_PASSWORD"
restic -r "s3:s3.amazonaws.com/vapora-backups/restic" check
  1. Force full verification (slow):
restic -r "s3:s3.amazonaws.com/vapora-backups/restic" check --read-data
  1. List recent snapshots:
restic -r "s3:s3.amazonaws.com/vapora-backups/restic" snapshots --max 10

Troubleshooting

Issue Cause Solution
CronJob not running Schedule incorrect Check kubectl get cronjobs and verify schedule format
Backup file too large Database growing Check for old data that can be cleaned up
S3 upload fails Credentials invalid Verify AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY
Restic backup slow First backup or network latency Expected on first run; use --keep-* flags to limit retention
Recovery fails Database already running Scale down StatefulSet before recovery
Encryption key missing Secret not created Create vapora-encryption-key secret in namespace

  • Disaster Recovery Procedures: docs/disaster-recovery/README.md
  • Backup Strategy: docs/disaster-recovery/backup-strategy.md
  • Database Recovery: docs/disaster-recovery/database-recovery-procedures.md
  • Operations Guide: docs/operations/README.md

Last Updated: January 12, 2026 Status: Production-Ready Automation: Full CronJob automation with health checks