Pre-Deployment Checklist
Critical verification steps before any VAPORA deployment to production or staging.
24 Hours Before Deployment
Communication & Scheduling
- Schedule deployment with team (record in calendar/ticket)
- Post in #deployments channel: "Deployment scheduled for [DATE TIME UTC]"
- Identify on-call engineer for deployment period
- Brief on-call on deployment plan and rollback procedure
- Ensure affected teams (support, product, etc.) are notified
- Verify no other critical infrastructure changes scheduled same time window
Change Documentation
- Create GitHub issue or ticket tracking the deployment
- Document: what's changing (configs, manifests, versions)
- Document: why (bug fix, feature, performance, security)
- Document: rollback plan (revision number or previous config)
- Document: success criteria (what indicates successful deployment)
- Document: estimated duration (usually 5-15 minutes)
Code Review & Validation
- All provisioning changes merged and code reviewed
-
Confirm
mainbranch has latest changes -
Run validation locally:
nu scripts/validate-config.nu --mode enterprise - Verify all 3 modes validate without errors or critical warnings
- Check git log for unexpected commits
- Review artifact generation: ensure configs are correct
4 Hours Before Deployment
Environment Verification
Staging Environment
-
Access staging Kubernetes cluster:
kubectl cluster-info -
Verify cluster is healthy:
kubectl get nodes(all Ready) -
Check namespace exists:
kubectl get namespace vapora -
Verify current deployments:
kubectl get deployments -n vapora -
Check ConfigMap is up to date:
kubectl get configmap -n vapora -o yaml | head -20
Production Environment (if applicable)
-
Access production Kubernetes cluster:
kubectl cluster-info -
Verify all nodes healthy:
kubectl get nodes(all Ready) -
Check current resource usage:
kubectl top nodes(not near capacity) -
Verify current deployments:
kubectl get deployments -n vapora -
Check pod status:
kubectl get pods -n vapora(all Running) -
Verify recent events:
kubectl get events -n vapora --sort-by='.lastTimestamp' | tail -10
Health Baseline
-
Record current metrics before deployment
- CPU usage per deployment
- Memory usage per deployment
- Request latency (p50, p95, p99)
- Error rate (4xx, 5xx)
- Queue depth (if applicable)
-
Verify services are responsive:
curl http://localhost:8001/health -H "Authorization: Bearer $TOKEN" curl http://localhost:8001/api/projects -
Check logs for recent errors:
kubectl logs deployment/vapora-backend -n vapora --tail=50 kubectl logs deployment/vapora-agents -n vapora --tail=50
Infrastructure Check
-
Verify storage is not near capacity:
df -h /var/lib/vapora -
Check database health:
kubectl exec -n vapora <pod> -- surreal info - Verify backups are recent (within 24 hours)
-
Check SSL certificate expiration:
openssl s_client -connect api.vapora.com:443 -showcerts | grep "Validity"
2 Hours Before Deployment
Artifact Preparation
-
Trigger validation in CI/CD pipeline
-
Wait for artifact generation to complete
-
Download artifacts from pipeline:
# From GitHub Actions or Woodpecker UI # Download: deployment-artifacts.zip -
Verify artifact contents:
unzip deployment-artifacts.zip ls -la # Should contain: # - configmap.yaml # - deployment.yaml # - docker-compose.yml # - vapora-{solo,multiuser,enterprise}.{toml,yaml,json} -
Validate manifest syntax:
yq eval '.' configmap.yaml > /dev/null && echo "✓ ConfigMap valid" yq eval '.' deployment.yaml > /dev/null && echo "✓ Deployment valid"
Test in Staging
-
Perform dry-run deployment to staging cluster:
kubectl apply -f configmap.yaml --dry-run=server -n vapora kubectl apply -f deployment.yaml --dry-run=server -n vapora -
Review dry-run output for any warnings or errors
-
If test deployment available, do actual staging deployment and verify:
kubectl get deployments -n vapora kubectl get pods -n vapora kubectl logs deployment/vapora-backend -n vapora --tail=5 -
Test health endpoints on staging
-
Run smoke tests against staging (if available)
Rollback Plan Verification
-
Document current deployment revisions:
kubectl rollout history deployment/vapora-backend -n vapora # Record the highest revision number -
Create backup of current ConfigMap:
kubectl get configmap -n vapora vapora-config -o yaml > configmap-backup.yaml -
Test rollback procedure on staging (if safe):
# Record current revision CURRENT_REV=$(kubectl rollout history deployment/vapora-backend -n vapora | tail -1 | awk '{print $1}') # Test undo kubectl rollout undo deployment/vapora-backend -n vapora # Verify rollback kubectl get deployment vapora-backend -n vapora -o yaml | grep image # Restore to current kubectl rollout undo deployment/vapora-backend -n vapora --to-revision=$CURRENT_REV -
Confirm rollback command is documented in ticket/issue
1 Hour Before Deployment
Final Checks
-
Confirm all prerequisites met:
- Code merged to main
- Artifacts generated and validated
- Staging deployment tested
- Rollback plan documented
- Team notified
Communication Setup
-
Set status page to "Maintenance Mode" (if public)
"VAPORA maintenance deployment starting at HH:MM UTC. Expected duration: 10 minutes. Services may be briefly unavailable." -
Join #deployments Slack channel
-
Prepare message: "🚀 Deployment starting now. Will update every 2 minutes."
-
Have on-call engineer monitoring
-
Verify monitoring/alerting dashboards are accessible
Access Verification
-
Verify kubeconfig is valid and up to date:
kubectl cluster-info kubectl get nodes -
Verify kubectl version compatibility:
kubectl version # Should match server version reasonably (within 1 minor version) -
Test write access to cluster:
kubectl auth can-i create deployments --namespace=vapora # Should return "yes" -
Verify docker/docker-compose access (if Docker deployment)
-
Verify Slack webhook is working (test send message)
15 Minutes Before Deployment
Final Go/No-Go Decision
STOP HERE and make final decision to proceed or reschedule:
Proceed IF:
- ✅ All checklist items above completed
- ✅ No critical issues found during testing
- ✅ Staging deployment successful
- ✅ Team ready and monitoring
- ✅ Rollback plan clear and tested
- ✅ Within designated maintenance window
RESCHEDULE IF:
- ❌ Any critical issues discovered
- ❌ Staging tests failed
- ❌ Team member unavailable
- ❌ Production issues detected
- ❌ Unexpected changes in code/configs
Final Notifications
If proceeding:
- Post to #deployments: "🚀 Deployment starting in 5 minutes"
- Alert on-call engineer: "Ready to start - confirm you're monitoring"
- Have rollback plan visible and accessible
- Open monitoring dashboard showing current metrics
Terminal Setup
-
Open terminal with kubeconfig configured:
export KUBECONFIG=/path/to/production/kubeconfig kubectl cluster-info # Verify connected to production -
Open second terminal for tailing logs:
kubectl logs -f deployment/vapora-backend -n vapora -
Have rollback commands ready:
# For quick rollback if needed kubectl rollout undo deployment/vapora-backend -n vapora kubectl rollout undo deployment/vapora-agents -n vapora kubectl rollout undo deployment/vapora-llm-router -n vapora -
Prepare metrics check script:
watch kubectl top pods -n vapora watch kubectl get pods -n vapora
Success Criteria Verification
Document what "success" looks like for this deployment:
- All three deployments have updated image IDs
- All pods reach "Ready" state within 5 minutes
-
No pod restarts:
kubectl get pods -n vapora --watch(no restarts column increasing) - No error logs in first 2 minutes
- Health endpoints respond (200 OK)
- API endpoints respond to test requests
- Metrics show normal resource usage
- No alerts triggered
- Support team reports no user impact
Team Roles During Deployment
Deployment Lead
- Executes deployment commands
- Monitors progress
- Communicates status updates
- Decides to proceed/rollback
On-Call Engineer
- Monitors dashboards and alerts
- Watches for anomalies
- Prepares for rollback if needed
- Available for emergency decisions
Communications Lead (optional)
- Updates #deployments channel
- Notifies support/product teams
- Updates status page if public
- Handles external communication
Backup Person
- Monitors for issues
- Ready to assist with troubleshooting
- Prepares rollback procedures
- Escalates if needed
Common Issues to Watch For
⚠️ Pod CrashLoopBackOff
- Indicates config or image issue
- Check pod logs:
kubectl logs <pod> - Check events:
kubectl describe pod <pod> - Action: Rollback immediately
⚠️ Pending Pods (not starting)
- Check resource availability:
kubectl describe pod <pod> - Check node capacity
- Action: Investigate or rollback if resource exhausted
⚠️ High Error Rate
- Check application logs
- Compare with baseline errors
- Action: If >10% error increase, rollback
⚠️ Database Connection Errors
- Check ConfigMap has correct database URL
- Verify network connectivity to database
- Action: Check ConfigMap, fix and reapply if needed
⚠️ Memory or CPU Spike
- Monitor trends (sudden spike vs gradual)
- Check if within expected range for new code
- Action: Rollback if resource limits exceeded
Post-Deployment Documentation
After deployment completes, record:
- Deployment start time (UTC)
- Deployment end time (UTC)
- Total duration
- Any issues encountered and resolution
- Rollback performed (Y/N)
- Metrics before/after (CPU, memory, latency, errors)
- Team members involved
- Blockers or lessons learned
Sign-Off
Use this template for deployment issue/ticket:
DEPLOYMENT COMPLETED
✓ All checks passed
✓ Deployment successful
✓ All pods running
✓ Health checks passing
✓ No user impact
Deployed by: [Name]
Start time: [UTC]
Duration: [X minutes]
Rollback needed: No
Metrics:
- Latency (p99): [X]ms
- Error rate: [X]%
- Pod restarts: 0
Next deployment: [Date/Time]