Vapora/docs/disaster-recovery/database-recovery-procedures.md

# Database Recovery Procedures

Detailed procedures for recovering SurrealDB in various failure scenarios.

---

## Quick Reference: Recovery Methods

| Scenario | Method | Time | Data Loss |
|----------|--------|------|-----------|
| **Pod restart** | Automatic pod recovery | 2 min | 0 |
| **Pod crash** | Persistent volume intact | 3 min | 0 |
| **Corrupted pod** | Restart from snapshot | 5 min | 0 |
| **Corrupted database** | Restore from backup | 15 min | 0-60 min |
| **Complete loss** | Restore from backup | 30 min | 0-60 min |

---

## SurrealDB Architecture

```
VAPORA Database Layer

SurrealDB Pod (Kubernetes)
├── PersistentVolume: /var/lib/surrealdb/
├── Data file: data.db (RocksDB)
├── Index files: *.idx
└── Wal (Write-ahead log): *.wal

Backed up to:
├── Hourly exports: S3 backups/database/
├── CloudSQL snapshots: AWS/GCP snapshots
└── Archive backups: Glacier (monthly)
```

---

## Scenario 1: Pod Restart (Most Common)

**Cause**: Node maintenance, resource limits, health check failure

**Duration**: 2-3 minutes
**Data Loss**: None

### Recovery Procedure

```bash
# Most of the time, just restart the pod

# 1. Delete the pod
kubectl delete pod -n vapora surrealdb-0

# 2. Pod automatically restarts (via StatefulSet)
kubectl get pods -n vapora -w

# 3. Verify it's Ready
kubectl get pod surrealdb-0 -n vapora
# Should show: 1/1 Running

# 4. Verify database is accessible
kubectl exec -n vapora surrealdb-0 -- \
  surreal sql "SELECT 1"

# 5. Check data integrity
kubectl exec -n vapora surrealdb-0 -- \
  surreal sql "SELECT COUNT(*) FROM projects"
# Should return non-zero count
```

---

## Scenario 2: Pod CrashLoop (Container Issue)

**Cause**: Application crash, memory issues, corrupt index

**Duration**: 5-10 minutes
**Data Loss**: None (usually)

### Recovery Procedure

```bash
# 1. Examine pod logs to identify issue
kubectl logs surrealdb-0 -n vapora --previous
# Look for: "panic", "fatal", "out of memory"

# 2. Increase resource limits if memory issue
kubectl patch statefulset surrealdb -n vapora --type='json' \
  -p='[{"op": "replace", "path": "/spec/template/spec/containers/0/resources/limits/memory", "value":"2Gi"}]'

# 3. If corrupt index, rebuild
kubectl exec -n vapora surrealdb-0 -- \
  surreal query "REBUILD INDEX"

# 4. If persistent issue, try volume snapshot
kubectl delete pod -n vapora surrealdb-0
# Use previous snapshot (if available)

# 5. Monitor restart
kubectl get pods -n vapora -w
```

---

## Scenario 3: Corrupted Database (Detected via Queries)

**Cause**: Unclean shutdown, disk issue, data corruption

**Duration**: 15-30 minutes
**Data Loss**: Minimal (last hour of transactions)

### Detection

```bash
# Symptoms to watch for
✗ Queries return error: "corrupted database"
✗ Disk check shows corruption
✗ Checksums fail
✗ Integrity check fails

# Verify corruption
kubectl exec -n vapora surrealdb-0 -- \
  surreal query "INFO FOR DB"
# Look for any error messages

# Try repair
kubectl exec -n vapora surrealdb-0 -- \
  surreal query "REBUILD INDEX"
```

### Recovery: Option A - Restart and Repair (Try First)

```bash
# 1. Delete pod to force restart
kubectl delete pod -n vapora surrealdb-0

# 2. Watch restart
kubectl get pods -n vapora -w
# Should restart within 30 seconds

# 3. Verify database accessible
kubectl exec -n vapora surrealdb-0 -- \
  surreal sql "SELECT COUNT(*) FROM projects"

# 4. If successful, done
# If still errors, proceed to Option B
```

### Recovery: Option B - Restore from Recent Backup

```bash
# 1. Stop database pod
kubectl scale statefulset surrealdb --replicas=0 -n vapora

# 2. Download latest backup
aws s3 cp s3://vapora-backups/database/ ./ --recursive
# Get most recent .sql.gz file

# 3. Clear corrupted data
kubectl delete pvc -n vapora surrealdb-data-surrealdb-0

# 4. Recreate pod (will create new PVC)
kubectl scale statefulset surrealdb --replicas=1 -n vapora

# 5. Wait for pod to be ready
kubectl wait --for=condition=Ready pod/surrealdb-0 \
  -n vapora --timeout=300s

# 6. Restore backup
# Extract and import
gunzip vapora-db-*.sql.gz
kubectl cp vapora-db-*.sql vapora/surrealdb-0:/tmp/

kubectl exec -n vapora surrealdb-0 -- \
  surreal import \
    --conn ws://localhost:8000 \
    --user root \
    --pass $DB_PASSWORD \
    --input /tmp/vapora-db-*.sql

# 7. Verify restored data
kubectl exec -n vapora surrealdb-0 -- \
  surreal sql "SELECT COUNT(*) FROM projects"
# Should match pre-corruption count
```

---

## Scenario 4: Storage Failure (PVC Issue)

**Cause**: Storage volume corruption, node storage failure

**Duration**: 20-30 minutes
**Data Loss**: None with backup

### Recovery Procedure

```bash
# 1. Detect storage issue
kubectl describe pvc -n vapora surrealdb-data-surrealdb-0
# Look for: "Pod pending", "volume binding failure"

# 2. Check if snapshot available (cloud)
aws ec2 describe-snapshots \
  --filters "Name=tag:database,Values=vapora" \
  --query 'Snapshots[].{SnapshotId:SnapshotId,StartTime:StartTime}' \
  --sort-by StartTime | tail -10

# 3. Create new PVC from snapshot
kubectl apply -f - << EOF
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: surrealdb-data-surrealdb-0-restore
  namespace: vapora
spec:
  accessModes:
    - ReadWriteOnce
  dataSource:
    kind: VolumeSnapshot
    apiGroup: snapshot.storage.k8s.io
    name: surrealdb-snapshot-latest
  resources:
    requests:
      storage: 100Gi
EOF

# 4. Update StatefulSet to use new PVC
kubectl patch statefulset surrealdb -n vapora --type='json' \
  -p='[{"op": "replace", "path": "/spec/volumeClaimTemplates/0/metadata/name", "value":"surrealdb-data-surrealdb-0-restore"}]'

# 5. Delete old pod to force remount
kubectl delete pod -n vapora surrealdb-0

# 6. Verify new pod runs
kubectl get pods -n vapora -w

# 7. Test database
kubectl exec -n vapora surrealdb-0 -- \
  surreal sql "SELECT COUNT(*) FROM projects"
```

---

## Scenario 5: Complete Data Loss (Restore from Backup)

**Cause**: User delete, accidental truncate, security incident

**Duration**: 30-60 minutes
**Data Loss**: Up to 1 hour

### Pre-Recovery Checklist

```
Before restoring, verify:
□ What data was lost? (specific tables or entire DB?)
□ When was it lost? (exact time if possible)
□ Is it just one table or entire database?
□ Do we have valid backups from before loss?
□ Has the backup been tested before?
```

### Recovery Procedure

```bash
# 1. Stop the database
kubectl scale statefulset surrealdb --replicas=0 -n vapora
sleep 10

# 2. Identify backup to restore
# Look for backup from time BEFORE data loss
aws s3 ls s3://vapora-backups/database/ --recursive | sort
# Example: surrealdb-2026-01-12-230000.sql.gz
# (from 11 PM, before 12 AM loss)

# 3. Download backup
aws s3 cp s3://vapora-backups/database/2026-01-12-surrealdb-230000.sql.gz ./

gunzip surrealdb-230000.sql

# 4. Verify backup integrity before restoring
# Extract first 100 lines to check format
head -100 surrealdb-230000.sql

# 5. Delete corrupted PVC
kubectl delete pvc -n vapora surrealdb-data-surrealdb-0

# 6. Restart database pod (will create new PVC)
kubectl scale statefulset surrealdb --replicas=1 -n vapora

# 7. Wait for pod to be ready and listening
kubectl wait --for=condition=Ready pod/surrealdb-0 \
  -n vapora --timeout=300s
sleep 10

# 8. Copy backup to pod
kubectl cp surrealdb-230000.sql vapora/surrealdb-0:/tmp/

# 9. Restore backup
kubectl exec -n vapora surrealdb-0 -- \
  surreal import \
    --conn ws://localhost:8000 \
    --user root \
    --pass $DB_PASSWORD \
    --input /tmp/surrealdb-230000.sql

# Expected output:
# Imported 1500+ records...
# This should take 5-15 minutes depending on backup size

# 10. Verify data restored
kubectl exec -n vapora surrealdb-0 -- \
  surreal sql \
    --conn ws://localhost:8000 \
    --user root \
    --pass $DB_PASSWORD \
    "SELECT COUNT(*) as project_count FROM projects"

# Should match pre-loss count
```

### Data Loss Assessment

```bash
# After restore, compare with lost version

# 1. Get current record count
RESTORED_COUNT=$(kubectl exec -n vapora surrealdb-0 -- \
  surreal sql "SELECT COUNT(*) FROM projects")

# 2. Get pre-loss count (from logs or ticket)
PRE_LOSS_COUNT=1500

# 3. Calculate data loss
if [ "$RESTORED_COUNT" -lt "$PRE_LOSS_COUNT" ]; then
  LOSS=$(( PRE_LOSS_COUNT - RESTORED_COUNT ))
  echo "Data loss: $LOSS records"
  echo "Data loss duration: ~1 hour"
  echo "Restore successful but incomplete"
else
  echo "Data loss: 0 records"
  echo "Full recovery complete"
fi
```

---

## Scenario 6: Backup Verification Failed

**Cause**: Corrupt backup file, incompatible format

**Duration**: 30-120 minutes (fallback to older backup)
**Data Loss**: 2+ hours possible

### Recovery Procedure

```bash
# 1. Identify backup corruption
# During restore, if backup fails import:

kubectl exec -n vapora surrealdb-0 -- \
  surreal import \
    --conn ws://localhost:8000 \
    --user root \
    --pass $DB_PASSWORD \
    --input /tmp/backup.sql

# Error: "invalid SQL format" or similar

# 2. Check backup file integrity
file vapora-db-backup.sql
# Should show: ASCII text

head -5 vapora-db-backup.sql
# Should show: SQL statements or surreal export format

# 3. If corrupt, try next-oldest backup
aws s3 ls s3://vapora-backups/database/ --recursive | sort | tail -5
# Get second-newest backup

# 4. Retry restore with older backup
aws s3 cp s3://vapora-backups/database/2026-01-12-210000/ ./
gunzip backup.sql.gz

# 5. Repeat restore procedure with older backup
# (As in Scenario 5, steps 8-10)
```

---

## Scenario 7: Database Size Growing Unexpectedly

**Cause**: Accumulation of data, logs not rotated, storage leak

**Duration**: Varies (prevention focus)
**Data Loss**: None

### Detection

```bash
# Monitor database size
kubectl exec -n vapora surrealdb-0 -- du -sh /var/lib/surrealdb/

# Check disk usage trend
# (Should be ~1-2% growth per week)

# If sudden spike:
kubectl exec -n vapora surrealdb-0 -- \
  find /var/lib/surrealdb/ -type f -exec ls -lh {} + | sort -k5 -h | tail -20
```

### Cleanup Procedure

```bash
# 1. Identify large tables
kubectl exec -n vapora surrealdb-0 -- \
  surreal sql "SELECT table, count(*) FROM meta::tb GROUP BY table ORDER BY count DESC"

# 2. If logs table too large
kubectl exec -n vapora surrealdb-0 -- \
  surreal sql "DELETE FROM audit_logs WHERE created_at < now() - 90d"

# 3. Rebuild indexes to reclaim space
kubectl exec -n vapora surrealdb-0 -- \
  surreal query "REBUILD INDEX"

# 4. If still large, delete old records from other tables
kubectl exec -n vapora surrealdb-0 -- \
  surreal sql "DELETE FROM tasks WHERE status = 'archived' AND updated_at < now() - 1y"

# 5. Monitor size after cleanup
kubectl exec -n vapora surrealdb-0 -- du -sh /var/lib/surrealdb/
```

---

## Scenario 8: Replication Lag (If Using Replicas)

**Cause**: Replica behind primary, network latency

**Duration**: Usually self-healing (seconds to minutes)
**Data Loss**: None

### Detection

```bash
# Check replica lag
kubectl exec -n vapora surrealdb-replica -- \
  surreal sql "SHOW REPLICATION STATUS"

# Look for: "Seconds_Behind_Master" > 5 seconds
```

### Recovery

```bash
# Usually self-healing, but if stuck:

# 1. Check network connectivity
kubectl exec -n vapora surrealdb-replica -- ping surrealdb-primary -c 5

# 2. Restart replica
kubectl delete pod -n vapora surrealdb-replica

# 3. Monitor replica catching up
kubectl logs -n vapora surrealdb-replica -f

# 4. Verify replica status
kubectl exec -n vapora surrealdb-replica -- \
  surreal sql "SHOW REPLICATION STATUS"
```

---

## Database Health Checks

### Pre-Recovery Verification

```bash
def verify_database_health [] {
  print "=== Database Health Check ==="

  # 1. Connection test
  let conn = (try (
    exec "surreal sql --conn ws://localhost:8000 \"SELECT 1\""
  ) catch {error make {msg: "Cannot connect to database"}})

  # 2. Data integrity test
  let integrity = (exec "surreal sql \"REBUILD INDEX\"")
  print "✓ Integrity check passed"

  # 3. Performance test
  let perf = (exec "surreal sql \"SELECT COUNT(*) FROM projects\"")
  print "✓ Performance acceptable"

  # 4. Replication lag (if applicable)
  # let lag = (exec "surreal sql \"SHOW REPLICATION STATUS\"")
  # print "✓ No replication lag"

  print "✓ All health checks passed"
}
```

### Post-Recovery Verification

```bash
def verify_recovery_success [] {
  print "=== Post-Recovery Verification ==="

  # 1. Database accessible
  kubectl exec -n vapora surrealdb-0 -- \
    surreal sql "SELECT 1"
  print "✓ Database accessible"

  # 2. All tables present
  kubectl exec -n vapora surrealdb-0 -- \
    surreal sql "SELECT table FROM meta::tb"
  print "✓ All tables present"

  # 3. Record counts reasonable
  kubectl exec -n vapora surrealdb-0 -- \
    surreal sql "SELECT table, count(*) FROM meta::tb"
  print "✓ Record counts verified"

  # 4. Application can connect
  kubectl logs -n vapora deployment/vapora-backend --tail=5 | grep -i connected
  print "✓ Application connected"

  # 5. API operational
  curl http://localhost:8001/api/projects
  print "✓ API operational"
}
```

---

## Database Recovery Checklist

### Before Recovery

```
□ Documented failure symptoms
□ Determined root cause
□ Selected appropriate recovery method
□ Located backup to restore
□ Verified backup integrity
□ Notified relevant teams
□ Have runbook available
□ Test environment ready (for testing)
```

### During Recovery

```
□ Followed procedure step-by-step
□ Monitored each step completion
□ Captured any error messages
□ Took notes of timings
□ Did NOT skip verification steps
□ Had backup plans ready
```

### After Recovery

```
□ Verified database accessible
□ Verified data integrity
□ Verified application can connect
□ Checked API endpoints working
□ Monitored error rates
□ Waited for 30 min stability check
□ Documented recovery procedure
□ Identified improvements needed
□ Updated runbooks if needed
```

---

## Recovery Troubleshooting

### Issue: "Cannot connect to database after restore"

**Cause**: Database not fully recovered, network issue

**Solution**:
```bash
# 1. Wait longer (import can take 15+ minutes)
sleep 60 && kubectl exec -n vapora surrealdb-0 -- surreal sql "SELECT 1"

# 2. Check pod logs
kubectl logs -n vapora surrealdb-0 | tail -50

# 3. Restart pod
kubectl delete pod -n vapora surrealdb-0

# 4. Check network connectivity
kubectl exec -n vapora surrealdb-0 -- ping localhost
```

### Issue: "Import corrupted data" error

**Cause**: Backup file corrupted or wrong format

**Solution**:
```bash
# 1. Try different backup
aws s3 ls s3://vapora-backups/database/ | sort | tail -5

# 2. Verify backup format
file vapora-db-backup.sql
# Should show: text

# 3. Manual inspection
head -20 vapora-db-backup.sql
# Should show SQL format

# 4. Try with older backup
```

### Issue: "Database running but data seems wrong"

**Cause**: Restored wrong backup or partial restore

**Solution**:
```bash
# 1. Verify record counts
kubectl exec -n vapora surrealdb-0 -- \
  surreal sql "SELECT table, count(*) FROM meta::tb"

# 2. Compare to pre-loss baseline
# (from documentation or logs)

# If counts don't match:
# - Used wrong backup
# - Restore incomplete
# - Try again with correct backup
```

---

## Database Recovery Reference

**Recovery Procedure Flowchart**:

```
Database Issue Detected
    ↓
Is it just a pod restart?
  YES → kubectl delete pod surrealdb-0
  NO → Continue
    ↓
Can queries connect and run?
  YES → Continue with application recovery
  NO → Continue
    ↓
Is data corrupted (errors in queries)?
  YES → Try REBUILD INDEX
  NO → Continue
    ↓
Still errors?
  YES → Scale replicas=0, clear PVC, restore from backup
  NO → Success, monitor for 30 min
```
chore: extend doc: adr, tutorials, operations, etc 2026-01-12 03:32:47 +00:00			`# Database Recovery Procedures`

			`Detailed procedures for recovering SurrealDB in various failure scenarios.`

			`---`

			`## Quick Reference: Recovery Methods`

			`\| Scenario \| Method \| Time \| Data Loss \|`
			`\|----------\|--------\|------\|-----------\|`
			`\| Pod restart \| Automatic pod recovery \| 2 min \| 0 \|`
			`\| Pod crash \| Persistent volume intact \| 3 min \| 0 \|`
			`\| Corrupted pod \| Restart from snapshot \| 5 min \| 0 \|`
			`\| Corrupted database \| Restore from backup \| 15 min \| 0-60 min \|`
			`\| Complete loss \| Restore from backup \| 30 min \| 0-60 min \|`

			`---`

			`## SurrealDB Architecture`

			```
			`VAPORA Database Layer`

			`SurrealDB Pod (Kubernetes)`
			`├── PersistentVolume: /var/lib/surrealdb/`
			`├── Data file: data.db (RocksDB)`
			`├── Index files: *.idx`
			`└── Wal (Write-ahead log): *.wal`

			`Backed up to:`
			`├── Hourly exports: S3 backups/database/`
			`├── CloudSQL snapshots: AWS/GCP snapshots`
			`└── Archive backups: Glacier (monthly)`
			```

			`---`

			`## Scenario 1: Pod Restart (Most Common)`

			`Cause: Node maintenance, resource limits, health check failure`

			`Duration: 2-3 minutes`
			`Data Loss: None`

			`### Recovery Procedure`

			```bash
			`# Most of the time, just restart the pod`

			`# 1. Delete the pod`
			`kubectl delete pod -n vapora surrealdb-0`

			`# 2. Pod automatically restarts (via StatefulSet)`
			`kubectl get pods -n vapora -w`

			`# 3. Verify it's Ready`
			`kubectl get pod surrealdb-0 -n vapora`
			`# Should show: 1/1 Running`

			`# 4. Verify database is accessible`
			`kubectl exec -n vapora surrealdb-0 -- \`
			`surreal sql "SELECT 1"`

			`# 5. Check data integrity`
			`kubectl exec -n vapora surrealdb-0 -- \`
			`surreal sql "SELECT COUNT(*) FROM projects"`
			`# Should return non-zero count`
			```

			`---`

			`## Scenario 2: Pod CrashLoop (Container Issue)`

			`Cause: Application crash, memory issues, corrupt index`

			`Duration: 5-10 minutes`
			`Data Loss: None (usually)`

			`### Recovery Procedure`

			```bash
			`# 1. Examine pod logs to identify issue`
			`kubectl logs surrealdb-0 -n vapora --previous`
			`# Look for: "panic", "fatal", "out of memory"`

			`# 2. Increase resource limits if memory issue`
			`kubectl patch statefulset surrealdb -n vapora --type='json' \`
			`-p='[{"op": "replace", "path": "/spec/template/spec/containers/0/resources/limits/memory", "value":"2Gi"}]'`

			`# 3. If corrupt index, rebuild`
			`kubectl exec -n vapora surrealdb-0 -- \`
			`surreal query "REBUILD INDEX"`

			`# 4. If persistent issue, try volume snapshot`
			`kubectl delete pod -n vapora surrealdb-0`
			`# Use previous snapshot (if available)`

			`# 5. Monitor restart`
			`kubectl get pods -n vapora -w`
			```

			`---`

			`## Scenario 3: Corrupted Database (Detected via Queries)`

			`Cause: Unclean shutdown, disk issue, data corruption`

			`Duration: 15-30 minutes`
			`Data Loss: Minimal (last hour of transactions)`

			`### Detection`

			```bash
			`# Symptoms to watch for`
			`✗ Queries return error: "corrupted database"`
			`✗ Disk check shows corruption`
			`✗ Checksums fail`
			`✗ Integrity check fails`

			`# Verify corruption`
			`kubectl exec -n vapora surrealdb-0 -- \`
			`surreal query "INFO FOR DB"`
			`# Look for any error messages`

			`# Try repair`
			`kubectl exec -n vapora surrealdb-0 -- \`
			`surreal query "REBUILD INDEX"`
			```

			`### Recovery: Option A - Restart and Repair (Try First)`

			```bash
			`# 1. Delete pod to force restart`
			`kubectl delete pod -n vapora surrealdb-0`

			`# 2. Watch restart`
			`kubectl get pods -n vapora -w`
			`# Should restart within 30 seconds`

			`# 3. Verify database accessible`
			`kubectl exec -n vapora surrealdb-0 -- \`
			`surreal sql "SELECT COUNT(*) FROM projects"`

			`# 4. If successful, done`
			`# If still errors, proceed to Option B`
			```

			`### Recovery: Option B - Restore from Recent Backup`

			```bash
			`# 1. Stop database pod`
			`kubectl scale statefulset surrealdb --replicas=0 -n vapora`

			`# 2. Download latest backup`
			`aws s3 cp s3://vapora-backups/database/ ./ --recursive`
			`# Get most recent .sql.gz file`

			`# 3. Clear corrupted data`
			`kubectl delete pvc -n vapora surrealdb-data-surrealdb-0`

			`# 4. Recreate pod (will create new PVC)`
			`kubectl scale statefulset surrealdb --replicas=1 -n vapora`

			`# 5. Wait for pod to be ready`
			`kubectl wait --for=condition=Ready pod/surrealdb-0 \`
			`-n vapora --timeout=300s`

			`# 6. Restore backup`
			`# Extract and import`
			`gunzip vapora-db-*.sql.gz`
			`kubectl cp vapora-db-*.sql vapora/surrealdb-0:/tmp/`

			`kubectl exec -n vapora surrealdb-0 -- \`
			`surreal import \`
			`--conn ws://localhost:8000 \`
			`--user root \`
			`--pass $DB_PASSWORD \`
			`--input /tmp/vapora-db-*.sql`

			`# 7. Verify restored data`
			`kubectl exec -n vapora surrealdb-0 -- \`
			`surreal sql "SELECT COUNT(*) FROM projects"`
			`# Should match pre-corruption count`
			```

			`---`

			`## Scenario 4: Storage Failure (PVC Issue)`

			`Cause: Storage volume corruption, node storage failure`

			`Duration: 20-30 minutes`
			`Data Loss: None with backup`

			`### Recovery Procedure`

			```bash
			`# 1. Detect storage issue`
			`kubectl describe pvc -n vapora surrealdb-data-surrealdb-0`
			`# Look for: "Pod pending", "volume binding failure"`

			`# 2. Check if snapshot available (cloud)`
			`aws ec2 describe-snapshots \`
			`--filters "Name=tag:database,Values=vapora" \`
			`--query 'Snapshots[].{SnapshotId:SnapshotId,StartTime:StartTime}' \`
			`--sort-by StartTime \| tail -10`

			`# 3. Create new PVC from snapshot`
			`kubectl apply -f - << EOF`
			`apiVersion: v1`
			`kind: PersistentVolumeClaim`
			`metadata:`
			`name: surrealdb-data-surrealdb-0-restore`
			`namespace: vapora`
			`spec:`
			`accessModes:`
			`- ReadWriteOnce`
			`dataSource:`
			`kind: VolumeSnapshot`
			`apiGroup: snapshot.storage.k8s.io`
			`name: surrealdb-snapshot-latest`
			`resources:`
			`requests:`
			`storage: 100Gi`
			`EOF`

			`# 4. Update StatefulSet to use new PVC`
			`kubectl patch statefulset surrealdb -n vapora --type='json' \`
			`-p='[{"op": "replace", "path": "/spec/volumeClaimTemplates/0/metadata/name", "value":"surrealdb-data-surrealdb-0-restore"}]'`

			`# 5. Delete old pod to force remount`
			`kubectl delete pod -n vapora surrealdb-0`

			`# 6. Verify new pod runs`
			`kubectl get pods -n vapora -w`

			`# 7. Test database`
			`kubectl exec -n vapora surrealdb-0 -- \`
			`surreal sql "SELECT COUNT(*) FROM projects"`
			```

			`---`

			`## Scenario 5: Complete Data Loss (Restore from Backup)`

			`Cause: User delete, accidental truncate, security incident`

			`Duration: 30-60 minutes`
			`Data Loss: Up to 1 hour`

			`### Pre-Recovery Checklist`

			```
			`Before restoring, verify:`
			`□ What data was lost? (specific tables or entire DB?)`
			`□ When was it lost? (exact time if possible)`
			`□ Is it just one table or entire database?`
			`□ Do we have valid backups from before loss?`
			`□ Has the backup been tested before?`
			```

			`### Recovery Procedure`

			```bash
			`# 1. Stop the database`
			`kubectl scale statefulset surrealdb --replicas=0 -n vapora`
			`sleep 10`

			`# 2. Identify backup to restore`
			`# Look for backup from time BEFORE data loss`
			`aws s3 ls s3://vapora-backups/database/ --recursive \| sort`
			`# Example: surrealdb-2026-01-12-230000.sql.gz`
			`# (from 11 PM, before 12 AM loss)`

			`# 3. Download backup`
			`aws s3 cp s3://vapora-backups/database/2026-01-12-surrealdb-230000.sql.gz ./`

			`gunzip surrealdb-230000.sql`

			`# 4. Verify backup integrity before restoring`
			`# Extract first 100 lines to check format`
			`head -100 surrealdb-230000.sql`

			`# 5. Delete corrupted PVC`
			`kubectl delete pvc -n vapora surrealdb-data-surrealdb-0`

			`# 6. Restart database pod (will create new PVC)`
			`kubectl scale statefulset surrealdb --replicas=1 -n vapora`

			`# 7. Wait for pod to be ready and listening`
			`kubectl wait --for=condition=Ready pod/surrealdb-0 \`
			`-n vapora --timeout=300s`
			`sleep 10`

			`# 8. Copy backup to pod`
			`kubectl cp surrealdb-230000.sql vapora/surrealdb-0:/tmp/`

			`# 9. Restore backup`
			`kubectl exec -n vapora surrealdb-0 -- \`
			`surreal import \`
			`--conn ws://localhost:8000 \`
			`--user root \`
			`--pass $DB_PASSWORD \`
			`--input /tmp/surrealdb-230000.sql`

			`# Expected output:`
			`# Imported 1500+ records...`
			`# This should take 5-15 minutes depending on backup size`

			`# 10. Verify data restored`
			`kubectl exec -n vapora surrealdb-0 -- \`
			`surreal sql \`
			`--conn ws://localhost:8000 \`
			`--user root \`
			`--pass $DB_PASSWORD \`
			`"SELECT COUNT(*) as project_count FROM projects"`

			`# Should match pre-loss count`
			```

			`### Data Loss Assessment`

			```bash
			`# After restore, compare with lost version`

			`# 1. Get current record count`
			`RESTORED_COUNT=$(kubectl exec -n vapora surrealdb-0 -- \`
			`surreal sql "SELECT COUNT(*) FROM projects")`

			`# 2. Get pre-loss count (from logs or ticket)`
			`PRE_LOSS_COUNT=1500`

			`# 3. Calculate data loss`
			`if [ "$RESTORED_COUNT" -lt "$PRE_LOSS_COUNT" ]; then`
			`LOSS=$(( PRE_LOSS_COUNT - RESTORED_COUNT ))`
			`echo "Data loss: $LOSS records"`
			`echo "Data loss duration: ~1 hour"`
			`echo "Restore successful but incomplete"`
			`else`
			`echo "Data loss: 0 records"`
			`echo "Full recovery complete"`
			`fi`
			```

			`---`

			`## Scenario 6: Backup Verification Failed`

			`Cause: Corrupt backup file, incompatible format`

			`Duration: 30-120 minutes (fallback to older backup)`
			`Data Loss: 2+ hours possible`

			`### Recovery Procedure`

			```bash
			`# 1. Identify backup corruption`
			`# During restore, if backup fails import:`

			`kubectl exec -n vapora surrealdb-0 -- \`
			`surreal import \`
			`--conn ws://localhost:8000 \`
			`--user root \`
			`--pass $DB_PASSWORD \`
			`--input /tmp/backup.sql`

			`# Error: "invalid SQL format" or similar`

			`# 2. Check backup file integrity`
			`file vapora-db-backup.sql`
			`# Should show: ASCII text`

			`head -5 vapora-db-backup.sql`
			`# Should show: SQL statements or surreal export format`

			`# 3. If corrupt, try next-oldest backup`
			`aws s3 ls s3://vapora-backups/database/ --recursive \| sort \| tail -5`
			`# Get second-newest backup`

			`# 4. Retry restore with older backup`
			`aws s3 cp s3://vapora-backups/database/2026-01-12-210000/ ./`
			`gunzip backup.sql.gz`

			`# 5. Repeat restore procedure with older backup`
			`# (As in Scenario 5, steps 8-10)`
			```

			`---`

			`## Scenario 7: Database Size Growing Unexpectedly`

			`Cause: Accumulation of data, logs not rotated, storage leak`

			`Duration: Varies (prevention focus)`
			`Data Loss: None`

			`### Detection`

			```bash
			`# Monitor database size`
			`kubectl exec -n vapora surrealdb-0 -- du -sh /var/lib/surrealdb/`

			`# Check disk usage trend`
			`# (Should be ~1-2% growth per week)`

			`# If sudden spike:`
			`kubectl exec -n vapora surrealdb-0 -- \`
			`find /var/lib/surrealdb/ -type f -exec ls -lh {} + \| sort -k5 -h \| tail -20`
			```

			`### Cleanup Procedure`

			```bash
			`# 1. Identify large tables`
			`kubectl exec -n vapora surrealdb-0 -- \`
			`surreal sql "SELECT table, count(*) FROM meta::tb GROUP BY table ORDER BY count DESC"`

			`# 2. If logs table too large`
			`kubectl exec -n vapora surrealdb-0 -- \`
			`surreal sql "DELETE FROM audit_logs WHERE created_at < now() - 90d"`

			`# 3. Rebuild indexes to reclaim space`
			`kubectl exec -n vapora surrealdb-0 -- \`
			`surreal query "REBUILD INDEX"`

			`# 4. If still large, delete old records from other tables`
			`kubectl exec -n vapora surrealdb-0 -- \`
			`surreal sql "DELETE FROM tasks WHERE status = 'archived' AND updated_at < now() - 1y"`

			`# 5. Monitor size after cleanup`
			`kubectl exec -n vapora surrealdb-0 -- du -sh /var/lib/surrealdb/`
			```

			`---`

			`## Scenario 8: Replication Lag (If Using Replicas)`

			`Cause: Replica behind primary, network latency`

			`Duration: Usually self-healing (seconds to minutes)`
			`Data Loss: None`

			`### Detection`

			```bash
			`# Check replica lag`
			`kubectl exec -n vapora surrealdb-replica -- \`
			`surreal sql "SHOW REPLICATION STATUS"`

			`# Look for: "Seconds_Behind_Master" > 5 seconds`
			```

			`### Recovery`

			```bash
			`# Usually self-healing, but if stuck:`

			`# 1. Check network connectivity`
			`kubectl exec -n vapora surrealdb-replica -- ping surrealdb-primary -c 5`

			`# 2. Restart replica`
			`kubectl delete pod -n vapora surrealdb-replica`

			`# 3. Monitor replica catching up`
			`kubectl logs -n vapora surrealdb-replica -f`

			`# 4. Verify replica status`
			`kubectl exec -n vapora surrealdb-replica -- \`
			`surreal sql "SHOW REPLICATION STATUS"`
			```

			`---`

			`## Database Health Checks`

			`### Pre-Recovery Verification`

			```bash
			`def verify_database_health [] {`
			`print "=== Database Health Check ==="`

			`# 1. Connection test`
			`let conn = (try (`
			`exec "surreal sql --conn ws://localhost:8000 \"SELECT 1\""`
			`) catch {error make {msg: "Cannot connect to database"}})`

			`# 2. Data integrity test`
			`let integrity = (exec "surreal sql \"REBUILD INDEX\"")`
			`print "✓ Integrity check passed"`

			`# 3. Performance test`
			`let perf = (exec "surreal sql \"SELECT COUNT(*) FROM projects\"")`
			`print "✓ Performance acceptable"`

			`# 4. Replication lag (if applicable)`
			`# let lag = (exec "surreal sql \"SHOW REPLICATION STATUS\"")`
			`# print "✓ No replication lag"`

			`print "✓ All health checks passed"`
			`}`
			```

			`### Post-Recovery Verification`

			```bash
			`def verify_recovery_success [] {`
			`print "=== Post-Recovery Verification ==="`

			`# 1. Database accessible`
			`kubectl exec -n vapora surrealdb-0 -- \`
			`surreal sql "SELECT 1"`
			`print "✓ Database accessible"`

			`# 2. All tables present`
			`kubectl exec -n vapora surrealdb-0 -- \`
			`surreal sql "SELECT table FROM meta::tb"`
			`print "✓ All tables present"`

			`# 3. Record counts reasonable`
			`kubectl exec -n vapora surrealdb-0 -- \`
			`surreal sql "SELECT table, count(*) FROM meta::tb"`
			`print "✓ Record counts verified"`

			`# 4. Application can connect`
			`kubectl logs -n vapora deployment/vapora-backend --tail=5 \| grep -i connected`
			`print "✓ Application connected"`

			`# 5. API operational`
			`curl http://localhost:8001/api/projects`
			`print "✓ API operational"`
			`}`
			```

			`---`

			`## Database Recovery Checklist`

			`### Before Recovery`

			```
			`□ Documented failure symptoms`
			`□ Determined root cause`
			`□ Selected appropriate recovery method`
			`□ Located backup to restore`
			`□ Verified backup integrity`
			`□ Notified relevant teams`
			`□ Have runbook available`
			`□ Test environment ready (for testing)`
			```

			`### During Recovery`

			```
			`□ Followed procedure step-by-step`
			`□ Monitored each step completion`
			`□ Captured any error messages`
			`□ Took notes of timings`
			`□ Did NOT skip verification steps`
			`□ Had backup plans ready`
			```

			`### After Recovery`

			```
			`□ Verified database accessible`
			`□ Verified data integrity`
			`□ Verified application can connect`
			`□ Checked API endpoints working`
			`□ Monitored error rates`
			`□ Waited for 30 min stability check`
			`□ Documented recovery procedure`
			`□ Identified improvements needed`
			`□ Updated runbooks if needed`
			```

			`---`

			`## Recovery Troubleshooting`

			`### Issue: "Cannot connect to database after restore"`

			`Cause: Database not fully recovered, network issue`

			`Solution:`
			```bash
			`# 1. Wait longer (import can take 15+ minutes)`
			`sleep 60 && kubectl exec -n vapora surrealdb-0 -- surreal sql "SELECT 1"`

			`# 2. Check pod logs`
			`kubectl logs -n vapora surrealdb-0 \| tail -50`

			`# 3. Restart pod`
			`kubectl delete pod -n vapora surrealdb-0`

			`# 4. Check network connectivity`
			`kubectl exec -n vapora surrealdb-0 -- ping localhost`
			```

			`### Issue: "Import corrupted data" error`

			`Cause: Backup file corrupted or wrong format`

			`Solution:`
			```bash
			`# 1. Try different backup`
			`aws s3 ls s3://vapora-backups/database/ \| sort \| tail -5`

			`# 2. Verify backup format`
			`file vapora-db-backup.sql`
			`# Should show: text`

			`# 3. Manual inspection`
			`head -20 vapora-db-backup.sql`
			`# Should show SQL format`

			`# 4. Try with older backup`
			```

			`### Issue: "Database running but data seems wrong"`

			`Cause: Restored wrong backup or partial restore`

			`Solution:`
			```bash
			`# 1. Verify record counts`
			`kubectl exec -n vapora surrealdb-0 -- \`
			`surreal sql "SELECT table, count(*) FROM meta::tb"`

			`# 2. Compare to pre-loss baseline`
			`# (from documentation or logs)`

			`# If counts don't match:`
			`# - Used wrong backup`
			`# - Restore incomplete`
			`# - Try again with correct backup`
			```

			`---`

			`## Database Recovery Reference`

			`Recovery Procedure Flowchart:`

			```
			`Database Issue Detected`
			`↓`
			`Is it just a pod restart?`
			`YES → kubectl delete pod surrealdb-0`
			`NO → Continue`
			`↓`
			`Can queries connect and run?`
			`YES → Continue with application recovery`
			`NO → Continue`
			`↓`
			`Is data corrupted (errors in queries)?`
			`YES → Try REBUILD INDEX`
			`NO → Continue`
			`↓`
			`Still errors?`
			`YES → Scale replicas=0, clear PVC, restore from backup`
			`NO → Success, monitor for 30 min`
			```