# Update Existing Infrastructure **Goal**: Safely update running infrastructure with minimal downtime **Time**: 15-30 minutes **Difficulty**: Intermediate ## Overview This guide covers: 1. Checking for updates 2. Planning update strategies 3. Updating task services 4. Rolling updates 5. Rollback procedures 6. Verification ## Update Strategies ### Strategy 1: In-Place Updates (Fastest) **Best for**: Non-critical environments, development, staging ```bash # Direct update without downtime consideration provisioning t create --infra ``` ### Strategy 2: Rolling Updates (Recommended) **Best for**: Production environments, high availability ```bash # Update servers one by one provisioning s update --infra --rolling ``` ### Strategy 3: Blue-Green Deployment (Safest) **Best for**: Critical production, zero-downtime requirements ```bash # Create new infrastructure, switch traffic, remove old provisioning ws init -green # ... configure and deploy # ... switch traffic provisioning ws delete -blue ``` ## Step 1: Check for Updates ### 1.1 Check All Task Services ```bash # Check all taskservs for updates provisioning t check-updates ``` **Expected Output:** ```bash ๐Ÿ“ฆ Task Service Update Check: NAME CURRENT LATEST STATUS kubernetes 1.29.0 1.30.0 โฌ†๏ธ update available containerd 1.7.13 1.7.13 โœ… up-to-date cilium 1.14.5 1.15.0 โฌ†๏ธ update available postgres 15.5 16.1 โฌ†๏ธ update available redis 7.2.3 7.2.3 โœ… up-to-date Updates available: 3 ``` ### 1.2 Check Specific Task Service ```bash # Check specific taskserv provisioning t check-updates kubernetes ``` **Expected Output:** ```bash ๐Ÿ“ฆ Kubernetes Update Check: Current: 1.29.0 Latest: 1.30.0 Status: โฌ†๏ธ Update available Changelog: โ€ข Enhanced security features โ€ข Performance improvements โ€ข Bug fixes in kube-apiserver โ€ข New workload resource types Breaking Changes: โ€ข None Recommended: โœ… Safe to update ``` ### 1.3 Check Version Status ```bash # Show detailed version information provisioning version show ``` **Expected Output:** ```bash ๐Ÿ“‹ Component Versions: COMPONENT CURRENT LATEST DAYS OLD STATUS kubernetes 1.29.0 1.30.0 45 โฌ†๏ธ update containerd 1.7.13 1.7.13 0 โœ… current cilium 1.14.5 1.15.0 30 โฌ†๏ธ update postgres 15.5 16.1 60 โฌ†๏ธ update (major) redis 7.2.3 7.2.3 0 โœ… current ``` ### 1.4 Check for Security Updates ```bash # Check for security-related updates provisioning version updates --security-only ``` ## Step 2: Plan Your Update ### 2.1 Review Current Configuration ```toml # Show current infrastructure provisioning show settings --infra my-production ``` ### 2.2 Backup Configuration ```toml # Create configuration backup cp -r workspace/infra/my-production workspace/infra/my-production.backup-$(date +%Y%m%d) # Or use built-in backup provisioning ws backup my-production ``` **Expected Output:** ```bash โœ… Backup created: workspace/backups/my-production-20250930.tar.gz ``` ### 2.3 Create Update Plan ```bash # Generate update plan provisioning plan update --infra my-production ``` **Expected Output:** ```bash ๐Ÿ“ Update Plan for my-production: Phase 1: Minor Updates (Low Risk) โ€ข containerd: No update needed โ€ข redis: No update needed Phase 2: Patch Updates (Medium Risk) โ€ข cilium: 1.14.5 โ†’ 1.15.0 (estimated 5 minutes) Phase 3: Major Updates (High Risk - Requires Testing) โ€ข kubernetes: 1.29.0 โ†’ 1.30.0 (estimated 15 minutes) โ€ข postgres: 15.5 โ†’ 16.1 (estimated 10 minutes, may require data migration) Recommended Order: 1. Update cilium (low risk) 2. Update kubernetes (test in staging first) 3. Update postgres (requires maintenance window) Total Estimated Time: 30 minutes Recommended: Test in staging environment first ``` ## Step 3: Update Task Services ### 3.1 Update Non-Critical Service (Cilium Example) #### Dry-Run Update ```bash # Test update without applying provisioning t create cilium --infra my-production --check ``` **Expected Output:** ```bash ๐Ÿ” CHECK MODE: Simulating Cilium update Current: 1.14.5 Target: 1.15.0 Would perform: 1. Download Cilium 1.15.0 2. Update configuration 3. Rolling restart of Cilium pods 4. Verify connectivity Estimated downtime: <1 minute per node No errors detected. Ready to update. ``` #### Generate Updated Configuration ```toml # Generate new configuration provisioning t generate cilium --infra my-production ``` **Expected Output:** ```bash โœ… Generated Cilium configuration (version 1.15.0) Saved to: workspace/infra/my-production/taskservs/cilium.ncl ``` #### Apply Update ```bash # Apply update provisioning t create cilium --infra my-production ``` **Expected Output:** ```bash ๐Ÿš€ Updating Cilium on my-production... Downloading Cilium 1.15.0... โณ โœ… Downloaded Updating configuration... โณ โœ… Configuration updated Rolling restart: web-01... โณ โœ… web-01 updated (Cilium 1.15.0) Rolling restart: web-02... โณ โœ… web-02 updated (Cilium 1.15.0) Verifying connectivity... โณ โœ… All nodes connected ๐ŸŽ‰ Cilium update complete! Version: 1.14.5 โ†’ 1.15.0 Downtime: 0 minutes ``` #### Verify Update ```bash # Verify updated version provisioning version taskserv cilium ``` **Expected Output:** ```bash ๐Ÿ“ฆ Cilium Version Info: Installed: 1.15.0 Latest: 1.15.0 Status: โœ… Up-to-date Nodes: โœ… web-01: 1.15.0 (running) โœ… web-02: 1.15.0 (running) ``` ### 3.2 Update Critical Service (Kubernetes Example) #### Test in Staging First ```bash # If you have staging environment provisioning t create kubernetes --infra my-staging --check provisioning t create kubernetes --infra my-staging # Run integration tests provisioning test kubernetes --infra my-staging ``` #### Backup Current State ```bash # Backup Kubernetes state kubectl get all -A -o yaml > k8s-backup-$(date +%Y%m%d).yaml # Backup etcd (if using external etcd) provisioning t backup kubernetes --infra my-production ``` #### Schedule Maintenance Window ```bash # Set maintenance mode (optional, if supported) provisioning maintenance enable --infra my-production --duration 30m ``` #### Update Kubernetes ```yaml # Update control plane first provisioning t create kubernetes --infra my-production --control-plane-only ``` **Expected Output:** ```bash ๐Ÿš€ Updating Kubernetes control plane on my-production... Draining control plane: web-01... โณ โœ… web-01 drained Updating control plane: web-01... โณ โœ… web-01 updated (Kubernetes 1.30.0) Uncordoning: web-01... โณ โœ… web-01 ready Verifying control plane... โณ โœ… Control plane healthy ๐ŸŽ‰ Control plane update complete! ``` ```bash # Update worker nodes one by one provisioning t create kubernetes --infra my-production --workers-only --rolling ``` **Expected Output:** ```bash ๐Ÿš€ Updating Kubernetes workers on my-production... Rolling update: web-02... Draining... โณ โœ… Drained (pods rescheduled) Updating... โณ โœ… Updated (Kubernetes 1.30.0) Uncordoning... โณ โœ… Ready Waiting for pods to stabilize... โณ โœ… All pods running ๐ŸŽ‰ Worker update complete! Updated: web-02 Version: 1.30.0 ``` #### Verify Update ```bash # Verify Kubernetes cluster kubectl get nodes provisioning version taskserv kubernetes ``` **Expected Output:** ```bash NAME STATUS ROLES AGE VERSION web-01 Ready control-plane 30d v1.30.0 web-02 Ready 30d v1.30.0 ``` ```bash # Run smoke tests provisioning test kubernetes --infra my-production ``` ### 3.3 Update Database (PostgreSQL Example) โš ๏ธ **WARNING**: Database updates may require data migration. Always backup first! #### Backup Database ```bash # Backup PostgreSQL database provisioning t backup postgres --infra my-production ``` **Expected Output:** ```bash ๐Ÿ—„๏ธ Backing up PostgreSQL... Creating dump: my-production-postgres-20250930.sql... โณ โœ… Dump created (2.3 GB) Compressing... โณ โœ… Compressed (450 MB) Saved to: workspace/backups/postgres/my-production-20250930.sql.gz ``` #### Check Compatibility ```bash # Check if data migration is needed provisioning t check-migration postgres --from 15.5 --to 16.1 ``` **Expected Output:** ```bash ๐Ÿ” PostgreSQL Migration Check: From: 15.5 To: 16.1 Migration Required: โœ… Yes (major version change) Steps Required: 1. Dump database with pg_dump 2. Stop PostgreSQL 15.5 3. Install PostgreSQL 16.1 4. Initialize new data directory 5. Restore from dump Estimated Time: 15-30 minutes (depending on data size) Estimated Downtime: 15-30 minutes Recommended: Use streaming replication for zero-downtime upgrade ``` #### Perform Update ```bash # Update PostgreSQL (with automatic migration) provisioning t create postgres --infra my-production --migrate ``` **Expected Output:** ```bash ๐Ÿš€ Updating PostgreSQL on my-production... โš ๏ธ Major version upgrade detected (15.5 โ†’ 16.1) Automatic migration will be performed Dumping database... โณ โœ… Database dumped (2.3 GB) Stopping PostgreSQL 15.5... โณ โœ… Stopped Installing PostgreSQL 16.1... โณ โœ… Installed Initializing new data directory... โณ โœ… Initialized Restoring database... โณ โœ… Restored (2.3 GB) Starting PostgreSQL 16.1... โณ โœ… Started Verifying data integrity... โณ โœ… All tables verified ๐ŸŽ‰ PostgreSQL update complete! Version: 15.5 โ†’ 16.1 Downtime: 18 minutes ``` #### Verify Update ```bash # Verify PostgreSQL provisioning version taskserv postgres ssh db-01 "psql --version" ``` ## Step 4: Update Multiple Services ### 4.1 Batch Update (Sequentially) ```bash # Update multiple taskservs one by one provisioning t update --infra my-production --taskservs cilium,containerd,redis ``` **Expected Output:** ```bash ๐Ÿš€ Updating 3 taskservs on my-production... [1/3] Updating cilium... โณ โœ… cilium updated (1.15.0) [2/3] Updating containerd... โณ โœ… containerd updated (1.7.14) [3/3] Updating redis... โณ โœ… redis updated (7.2.4) ๐ŸŽ‰ All updates complete! Updated: 3 taskservs Total time: 8 minutes ``` ### 4.2 Parallel Update (Non-Dependent Services) ```bash # Update taskservs in parallel (if they don't depend on each other) provisioning t update --infra my-production --taskservs redis,postgres --parallel ``` **Expected Output:** ```bash ๐Ÿš€ Updating 2 taskservs in parallel on my-production... redis: Updating... โณ postgres: Updating... โณ redis: โœ… Updated (7.2.4) postgres: โœ… Updated (16.1) ๐ŸŽ‰ All updates complete! Updated: 2 taskservs Total time: 3 minutes (parallel) ``` ## Step 5: Update Server Configuration ### 5.1 Update Server Resources ```bash # Edit server configuration provisioning sops workspace/infra/my-production/servers.ncl ``` **Example: Upgrade server plan** ```bash # Before { name = "web-01" plan = "1xCPU-2 GB" # Old plan } # After { name = "web-01" plan = "2xCPU-4 GB" # New plan } ``` ```bash # Apply server update provisioning s update --infra my-production --check provisioning s update --infra my-production ``` ### 5.2 Update Server OS ```bash # Update operating system packages provisioning s update --infra my-production --os-update ``` **Expected Output:** ```bash ๐Ÿš€ Updating OS packages on my-production servers... web-01: Updating packages... โณ โœ… web-01: 24 packages updated web-02: Updating packages... โณ โœ… web-02: 24 packages updated db-01: Updating packages... โณ โœ… db-01: 24 packages updated ๐ŸŽ‰ OS updates complete! ``` ## Step 6: Rollback Procedures ### 6.1 Rollback Task Service If update fails or causes issues: ```bash # Rollback to previous version provisioning t rollback cilium --infra my-production ``` **Expected Output:** ```bash ๐Ÿ”„ Rolling back Cilium on my-production... Current: 1.15.0 Target: 1.14.5 (previous version) Rolling back: web-01... โณ โœ… web-01 rolled back Rolling back: web-02... โณ โœ… web-02 rolled back Verifying connectivity... โณ โœ… All nodes connected ๐ŸŽ‰ Rollback complete! Version: 1.15.0 โ†’ 1.14.5 ``` ### 6.2 Rollback from Backup ```bash # Restore configuration from backup provisioning ws restore my-production --from workspace/backups/my-production-20250930.tar.gz ``` ### 6.3 Emergency Rollback ```bash # Complete infrastructure rollback provisioning rollback --infra my-production --to-snapshot ``` ## Step 7: Post-Update Verification ### 7.1 Verify All Components ```bash # Check overall health provisioning health --infra my-production ``` **Expected Output:** ```bash ๐Ÿฅ Health Check: my-production Servers: โœ… web-01: Healthy โœ… web-02: Healthy โœ… db-01: Healthy Task Services: โœ… kubernetes: 1.30.0 (healthy) โœ… containerd: 1.7.13 (healthy) โœ… cilium: 1.15.0 (healthy) โœ… postgres: 16.1 (healthy) Clusters: โœ… buildkit: 2/2 replicas (healthy) Overall Status: โœ… All systems healthy ``` ### 7.2 Verify Version Updates ```bash # Verify all versions are updated provisioning version show ``` ### 7.3 Run Integration Tests ```bash # Run comprehensive tests provisioning test all --infra my-production ``` **Expected Output:** ```bash ๐Ÿงช Running Integration Tests... [1/5] Server connectivity... โณ โœ… All servers reachable [2/5] Kubernetes health... โณ โœ… All nodes ready, all pods running [3/5] Network connectivity... โณ โœ… All services reachable [4/5] Database connectivity... โณ โœ… PostgreSQL responsive [5/5] Application health... โณ โœ… All applications healthy ๐ŸŽ‰ All tests passed! ``` ### 7.4 Monitor for Issues ```bash # Monitor logs for errors provisioning logs --infra my-production --follow --level error ``` ## Update Checklist Use this checklist for production updates: - [ ] Check for available updates - [ ] Review changelog and breaking changes - [ ] Create configuration backup - [ ] Test update in staging environment - [ ] Schedule maintenance window - [ ] Notify team/users of maintenance - [ ] Update non-critical services first - [ ] Verify each update before proceeding - [ ] Update critical services with rolling updates - [ ] Backup database before major updates - [ ] Verify all components after update - [ ] Run integration tests - [ ] Monitor for issues (30 minutes minimum) - [ ] Document any issues encountered - [ ] Close maintenance window ## Common Update Scenarios ### Scenario 1: Minor Security Patch ```bash # Quick security update provisioning t check-updates --security-only provisioning t update --infra my-production --security-patches --yes ``` ### Scenario 2: Major Version Upgrade ```bash # Careful major version update provisioning ws backup my-production provisioning t check-migration --from X.Y --to X+1.Y provisioning t create --infra my-production --migrate provisioning test all --infra my-production ``` ### Scenario 3: Emergency Hotfix ```bash # Apply critical hotfix immediately provisioning t create --infra my-production --hotfix --yes ``` ## Troubleshooting Updates ### Issue: Update fails mid-process **Solution:** ```bash # Check update status provisioning t status --infra my-production # Resume failed update provisioning t update --infra my-production --resume # Or rollback provisioning t rollback --infra my-production ``` ### Issue: Service not starting after update **Solution:** ```bash # Check logs provisioning logs --infra my-production # Verify configuration provisioning t validate --infra my-production # Rollback if necessary provisioning t rollback --infra my-production ``` ### Issue: Data migration fails **Solution:** ```bash # Check migration logs provisioning t migration-logs --infra my-production # Restore from backup provisioning t restore --infra my-production --from ``` ## Best Practices 1. **Always Test First**: Test updates in staging before production 2. **Backup Everything**: Create backups before any update 3. **Update Gradually**: Update one service at a time 4. **Monitor Closely**: Watch for errors after each update 5. **Have Rollback Plan**: Always have a rollback strategy 6. **Document Changes**: Keep update logs for reference 7. **Schedule Wisely**: Update during low-traffic periods 8. **Verify Thoroughly**: Run tests after each update ## Next Steps - **[Customize Guide](customize-infrastructure.md)** - Customize your infrastructure - **[From Scratch Guide](from-scratch.md)** - Deploy new infrastructure - **[Workflow Guide](../development/workflow.md)** - Automate with workflows ## Quick Reference ```bash # Update workflow provisioning t check-updates provisioning ws backup my-production provisioning t create --infra my-production --check provisioning t create --infra my-production provisioning version taskserv provisioning health --infra my-production provisioning test all --infra my-production ``` --- *This guide is part of the provisioning project documentation. Last updated: 2025-09-30*