# Update Existing Infrastructure\n\n**Goal**: Safely update running infrastructure with minimal downtime\n**Time**: 15-30 minutes\n**Difficulty**: Intermediate\n\n## Overview\n\nThis guide covers:\n\n1. Checking for updates\n2. Planning update strategies\n3. Updating task services\n4. Rolling updates\n5. Rollback procedures\n6. Verification\n\n## Update Strategies\n\n### Strategy 1: In-Place Updates (Fastest)\n\n**Best for**: Non-critical environments, development, staging\n\n```\n# Direct update without downtime consideration\nprovisioning t create --infra \n```\n\n### Strategy 2: Rolling Updates (Recommended)\n\n**Best for**: Production environments, high availability\n\n```\n# Update servers one by one\nprovisioning s update --infra --rolling\n```\n\n### Strategy 3: Blue-Green Deployment (Safest)\n\n**Best for**: Critical production, zero-downtime requirements\n\n```\n# Create new infrastructure, switch traffic, remove old\nprovisioning ws init -green\n# ... configure and deploy\n# ... switch traffic\nprovisioning ws delete -blue\n```\n\n## Step 1: Check for Updates\n\n### 1.1 Check All Task Services\n\n```\n# Check all taskservs for updates\nprovisioning t check-updates\n```\n\n**Expected Output:**\n\n```\n๐Ÿ“ฆ Task Service Update Check:\n\nNAME CURRENT LATEST STATUS\nkubernetes 1.29.0 1.30.0 โฌ†๏ธ update available\ncontainerd 1.7.13 1.7.13 โœ… up-to-date\ncilium 1.14.5 1.15.0 โฌ†๏ธ update available\npostgres 15.5 16.1 โฌ†๏ธ update available\nredis 7.2.3 7.2.3 โœ… up-to-date\n\nUpdates available: 3\n```\n\n### 1.2 Check Specific Task Service\n\n```\n# Check specific taskserv\nprovisioning t check-updates kubernetes\n```\n\n**Expected Output:**\n\n```\n๐Ÿ“ฆ Kubernetes Update Check:\n\nCurrent: 1.29.0\nLatest: 1.30.0\nStatus: โฌ†๏ธ Update available\n\nChangelog:\n โ€ข Enhanced security features\n โ€ข Performance improvements\n โ€ข Bug fixes in kube-apiserver\n โ€ข New workload resource types\n\nBreaking Changes:\n โ€ข None\n\nRecommended: โœ… Safe to update\n```\n\n### 1.3 Check Version Status\n\n```\n# Show detailed version information\nprovisioning version show\n```\n\n**Expected Output:**\n\n```\n๐Ÿ“‹ Component Versions:\n\nCOMPONENT CURRENT LATEST DAYS OLD STATUS\nkubernetes 1.29.0 1.30.0 45 โฌ†๏ธ update\ncontainerd 1.7.13 1.7.13 0 โœ… current\ncilium 1.14.5 1.15.0 30 โฌ†๏ธ update\npostgres 15.5 16.1 60 โฌ†๏ธ update (major)\nredis 7.2.3 7.2.3 0 โœ… current\n```\n\n### 1.4 Check for Security Updates\n\n```\n# Check for security-related updates\nprovisioning version updates --security-only\n```\n\n## Step 2: Plan Your Update\n\n### 2.1 Review Current Configuration\n\n```\n# Show current infrastructure\nprovisioning show settings --infra my-production\n```\n\n### 2.2 Backup Configuration\n\n```\n# Create configuration backup\ncp -r workspace/infra/my-production workspace/infra/my-production.backup-$(date +%Y%m%d)\n\n# Or use built-in backup\nprovisioning ws backup my-production\n```\n\n**Expected Output:**\n\n```\nโœ… Backup created: workspace/backups/my-production-20250930.tar.gz\n```\n\n### 2.3 Create Update Plan\n\n```\n# Generate update plan\nprovisioning plan update --infra my-production\n```\n\n**Expected Output:**\n\n```\n๐Ÿ“ Update Plan for my-production:\n\nPhase 1: Minor Updates (Low Risk)\n โ€ข containerd: No update needed\n โ€ข redis: No update needed\n\nPhase 2: Patch Updates (Medium Risk)\n โ€ข cilium: 1.14.5 โ†’ 1.15.0 (estimated 5 minutes)\n\nPhase 3: Major Updates (High Risk - Requires Testing)\n โ€ข kubernetes: 1.29.0 โ†’ 1.30.0 (estimated 15 minutes)\n โ€ข postgres: 15.5 โ†’ 16.1 (estimated 10 minutes, may require data migration)\n\nRecommended Order:\n 1. Update cilium (low risk)\n 2. Update kubernetes (test in staging first)\n 3. Update postgres (requires maintenance window)\n\nTotal Estimated Time: 30 minutes\nRecommended: Test in staging environment first\n```\n\n## Step 3: Update Task Services\n\n### 3.1 Update Non-Critical Service (Cilium Example)\n\n#### Dry-Run Update\n\n```\n# Test update without applying\nprovisioning t create cilium --infra my-production --check\n```\n\n**Expected Output:**\n\n```\n๐Ÿ” CHECK MODE: Simulating Cilium update\n\nCurrent: 1.14.5\nTarget: 1.15.0\n\nWould perform:\n 1. Download Cilium 1.15.0\n 2. Update configuration\n 3. Rolling restart of Cilium pods\n 4. Verify connectivity\n\nEstimated downtime: <1 minute per node\nNo errors detected. Ready to update.\n```\n\n#### Generate Updated Configuration\n\n```\n# Generate new configuration\nprovisioning t generate cilium --infra my-production\n```\n\n**Expected Output:**\n\n```\nโœ… Generated Cilium configuration (version 1.15.0)\n Saved to: workspace/infra/my-production/taskservs/cilium.ncl\n```\n\n#### Apply Update\n\n```\n# Apply update\nprovisioning t create cilium --infra my-production\n```\n\n**Expected Output:**\n\n```\n๐Ÿš€ Updating Cilium on my-production...\n\nDownloading Cilium 1.15.0... โณ\nโœ… Downloaded\n\nUpdating configuration... โณ\nโœ… Configuration updated\n\nRolling restart: web-01... โณ\nโœ… web-01 updated (Cilium 1.15.0)\n\nRolling restart: web-02... โณ\nโœ… web-02 updated (Cilium 1.15.0)\n\nVerifying connectivity... โณ\nโœ… All nodes connected\n\n๐ŸŽ‰ Cilium update complete!\n Version: 1.14.5 โ†’ 1.15.0\n Downtime: 0 minutes\n```\n\n#### Verify Update\n\n```\n# Verify updated version\nprovisioning version taskserv cilium\n```\n\n**Expected Output:**\n\n```\n๐Ÿ“ฆ Cilium Version Info:\n\nInstalled: 1.15.0\nLatest: 1.15.0\nStatus: โœ… Up-to-date\n\nNodes:\n โœ… web-01: 1.15.0 (running)\n โœ… web-02: 1.15.0 (running)\n```\n\n### 3.2 Update Critical Service (Kubernetes Example)\n\n#### Test in Staging First\n\n```\n# If you have staging environment\nprovisioning t create kubernetes --infra my-staging --check\nprovisioning t create kubernetes --infra my-staging\n\n# Run integration tests\nprovisioning test kubernetes --infra my-staging\n```\n\n#### Backup Current State\n\n```\n# Backup Kubernetes state\nkubectl get all -A -o yaml > k8s-backup-$(date +%Y%m%d).yaml\n\n# Backup etcd (if using external etcd)\nprovisioning t backup kubernetes --infra my-production\n```\n\n#### Schedule Maintenance Window\n\n```\n# Set maintenance mode (optional, if supported)\nprovisioning maintenance enable --infra my-production --duration 30m\n```\n\n#### Update Kubernetes\n\n```\n# Update control plane first\nprovisioning t create kubernetes --infra my-production --control-plane-only\n```\n\n**Expected Output:**\n\n```\n๐Ÿš€ Updating Kubernetes control plane on my-production...\n\nDraining control plane: web-01... โณ\nโœ… web-01 drained\n\nUpdating control plane: web-01... โณ\nโœ… web-01 updated (Kubernetes 1.30.0)\n\nUncordoning: web-01... โณ\nโœ… web-01 ready\n\nVerifying control plane... โณ\nโœ… Control plane healthy\n\n๐ŸŽ‰ Control plane update complete!\n```\n\n```\n# Update worker nodes one by one\nprovisioning t create kubernetes --infra my-production --workers-only --rolling\n```\n\n**Expected Output:**\n\n```\n๐Ÿš€ Updating Kubernetes workers on my-production...\n\nRolling update: web-02...\n Draining... โณ\n โœ… Drained (pods rescheduled)\n\n Updating... โณ\n โœ… Updated (Kubernetes 1.30.0)\n\n Uncordoning... โณ\n โœ… Ready\n\n Waiting for pods to stabilize... โณ\n โœ… All pods running\n\n๐ŸŽ‰ Worker update complete!\n Updated: web-02\n Version: 1.30.0\n```\n\n#### Verify Update\n\n```\n# Verify Kubernetes cluster\nkubectl get nodes\nprovisioning version taskserv kubernetes\n```\n\n**Expected Output:**\n\n```\nNAME STATUS ROLES AGE VERSION\nweb-01 Ready control-plane 30d v1.30.0\nweb-02 Ready 30d v1.30.0\n```\n\n```\n# Run smoke tests\nprovisioning test kubernetes --infra my-production\n```\n\n### 3.3 Update Database (PostgreSQL Example)\n\nโš ๏ธ **WARNING**: Database updates may require data migration. Always backup first!\n\n#### Backup Database\n\n```\n# Backup PostgreSQL database\nprovisioning t backup postgres --infra my-production\n```\n\n**Expected Output:**\n\n```\n๐Ÿ—„๏ธ Backing up PostgreSQL...\n\nCreating dump: my-production-postgres-20250930.sql... โณ\nโœ… Dump created (2.3 GB)\n\nCompressing... โณ\nโœ… Compressed (450 MB)\n\nSaved to: workspace/backups/postgres/my-production-20250930.sql.gz\n```\n\n#### Check Compatibility\n\n```\n# Check if data migration is needed\nprovisioning t check-migration postgres --from 15.5 --to 16.1\n```\n\n**Expected Output:**\n\n```\n๐Ÿ” PostgreSQL Migration Check:\n\nFrom: 15.5\nTo: 16.1\n\nMigration Required: โœ… Yes (major version change)\n\nSteps Required:\n 1. Dump database with pg_dump\n 2. Stop PostgreSQL 15.5\n 3. Install PostgreSQL 16.1\n 4. Initialize new data directory\n 5. Restore from dump\n\nEstimated Time: 15-30 minutes (depending on data size)\nEstimated Downtime: 15-30 minutes\n\nRecommended: Use streaming replication for zero-downtime upgrade\n```\n\n#### Perform Update\n\n```\n# Update PostgreSQL (with automatic migration)\nprovisioning t create postgres --infra my-production --migrate\n```\n\n**Expected Output:**\n\n```\n๐Ÿš€ Updating PostgreSQL on my-production...\n\nโš ๏ธ Major version upgrade detected (15.5 โ†’ 16.1)\n Automatic migration will be performed\n\nDumping database... โณ\nโœ… Database dumped (2.3 GB)\n\nStopping PostgreSQL 15.5... โณ\nโœ… Stopped\n\nInstalling PostgreSQL 16.1... โณ\nโœ… Installed\n\nInitializing new data directory... โณ\nโœ… Initialized\n\nRestoring database... โณ\nโœ… Restored (2.3 GB)\n\nStarting PostgreSQL 16.1... โณ\nโœ… Started\n\nVerifying data integrity... โณ\nโœ… All tables verified\n\n๐ŸŽ‰ PostgreSQL update complete!\n Version: 15.5 โ†’ 16.1\n Downtime: 18 minutes\n```\n\n#### Verify Update\n\n```\n# Verify PostgreSQL\nprovisioning version taskserv postgres\nssh db-01 "psql --version"\n```\n\n## Step 4: Update Multiple Services\n\n### 4.1 Batch Update (Sequentially)\n\n```\n# Update multiple taskservs one by one\nprovisioning t update --infra my-production --taskservs cilium,containerd,redis\n```\n\n**Expected Output:**\n\n```\n๐Ÿš€ Updating 3 taskservs on my-production...\n\n[1/3] Updating cilium... โณ\nโœ… cilium updated (1.15.0)\n\n[2/3] Updating containerd... โณ\nโœ… containerd updated (1.7.14)\n\n[3/3] Updating redis... โณ\nโœ… redis updated (7.2.4)\n\n๐ŸŽ‰ All updates complete!\n Updated: 3 taskservs\n Total time: 8 minutes\n```\n\n### 4.2 Parallel Update (Non-Dependent Services)\n\n```\n# Update taskservs in parallel (if they don't depend on each other)\nprovisioning t update --infra my-production --taskservs redis,postgres --parallel\n```\n\n**Expected Output:**\n\n```\n๐Ÿš€ Updating 2 taskservs in parallel on my-production...\n\nredis: Updating... โณ\npostgres: Updating... โณ\n\nredis: โœ… Updated (7.2.4)\npostgres: โœ… Updated (16.1)\n\n๐ŸŽ‰ All updates complete!\n Updated: 2 taskservs\n Total time: 3 minutes (parallel)\n```\n\n## Step 5: Update Server Configuration\n\n### 5.1 Update Server Resources\n\n```\n# Edit server configuration\nprovisioning sops workspace/infra/my-production/servers.ncl\n```\n\n**Example: Upgrade server plan**\n\n```\n# Before\n{\n name = "web-01"\n plan = "1xCPU-2 GB" # Old plan\n}\n\n# After\n{\n name = "web-01"\n plan = "2xCPU-4 GB" # New plan\n}\n```\n\n```\n# Apply server update\nprovisioning s update --infra my-production --check\nprovisioning s update --infra my-production\n```\n\n### 5.2 Update Server OS\n\n```\n# Update operating system packages\nprovisioning s update --infra my-production --os-update\n```\n\n**Expected Output:**\n\n```\n๐Ÿš€ Updating OS packages on my-production servers...\n\nweb-01: Updating packages... โณ\nโœ… web-01: 24 packages updated\n\nweb-02: Updating packages... โณ\nโœ… web-02: 24 packages updated\n\ndb-01: Updating packages... โณ\nโœ… db-01: 24 packages updated\n\n๐ŸŽ‰ OS updates complete!\n```\n\n## Step 6: Rollback Procedures\n\n### 6.1 Rollback Task Service\n\nIf update fails or causes issues:\n\n```\n# Rollback to previous version\nprovisioning t rollback cilium --infra my-production\n```\n\n**Expected Output:**\n\n```\n๐Ÿ”„ Rolling back Cilium on my-production...\n\nCurrent: 1.15.0\nTarget: 1.14.5 (previous version)\n\nRolling back: web-01... โณ\nโœ… web-01 rolled back\n\nRolling back: web-02... โณ\nโœ… web-02 rolled back\n\nVerifying connectivity... โณ\nโœ… All nodes connected\n\n๐ŸŽ‰ Rollback complete!\n Version: 1.15.0 โ†’ 1.14.5\n```\n\n### 6.2 Rollback from Backup\n\n```\n# Restore configuration from backup\nprovisioning ws restore my-production --from workspace/backups/my-production-20250930.tar.gz\n```\n\n### 6.3 Emergency Rollback\n\n```\n# Complete infrastructure rollback\nprovisioning rollback --infra my-production --to-snapshot \n```\n\n## Step 7: Post-Update Verification\n\n### 7.1 Verify All Components\n\n```\n# Check overall health\nprovisioning health --infra my-production\n```\n\n**Expected Output:**\n\n```\n๐Ÿฅ Health Check: my-production\n\nServers:\n โœ… web-01: Healthy\n โœ… web-02: Healthy\n โœ… db-01: Healthy\n\nTask Services:\n โœ… kubernetes: 1.30.0 (healthy)\n โœ… containerd: 1.7.13 (healthy)\n โœ… cilium: 1.15.0 (healthy)\n โœ… postgres: 16.1 (healthy)\n\nClusters:\n โœ… buildkit: 2/2 replicas (healthy)\n\nOverall Status: โœ… All systems healthy\n```\n\n### 7.2 Verify Version Updates\n\n```\n# Verify all versions are updated\nprovisioning version show\n```\n\n### 7.3 Run Integration Tests\n\n```\n# Run comprehensive tests\nprovisioning test all --infra my-production\n```\n\n**Expected Output:**\n\n```\n๐Ÿงช Running Integration Tests...\n\n[1/5] Server connectivity... โณ\nโœ… All servers reachable\n\n[2/5] Kubernetes health... โณ\nโœ… All nodes ready, all pods running\n\n[3/5] Network connectivity... โณ\nโœ… All services reachable\n\n[4/5] Database connectivity... โณ\nโœ… PostgreSQL responsive\n\n[5/5] Application health... โณ\nโœ… All applications healthy\n\n๐ŸŽ‰ All tests passed!\n```\n\n### 7.4 Monitor for Issues\n\n```\n# Monitor logs for errors\nprovisioning logs --infra my-production --follow --level error\n```\n\n## Update Checklist\n\nUse this checklist for production updates:\n\n- [ ] Check for available updates\n- [ ] Review changelog and breaking changes\n- [ ] Create configuration backup\n- [ ] Test update in staging environment\n- [ ] Schedule maintenance window\n- [ ] Notify team/users of maintenance\n- [ ] Update non-critical services first\n- [ ] Verify each update before proceeding\n- [ ] Update critical services with rolling updates\n- [ ] Backup database before major updates\n- [ ] Verify all components after update\n- [ ] Run integration tests\n- [ ] Monitor for issues (30 minutes minimum)\n- [ ] Document any issues encountered\n- [ ] Close maintenance window\n\n## Common Update Scenarios\n\n### Scenario 1: Minor Security Patch\n\n```\n# Quick security update\nprovisioning t check-updates --security-only\nprovisioning t update --infra my-production --security-patches --yes\n```\n\n### Scenario 2: Major Version Upgrade\n\n```\n# Careful major version update\nprovisioning ws backup my-production\nprovisioning t check-migration --from X.Y --to X+1.Y\nprovisioning t create --infra my-production --migrate\nprovisioning test all --infra my-production\n```\n\n### Scenario 3: Emergency Hotfix\n\n```\n# Apply critical hotfix immediately\nprovisioning t create --infra my-production --hotfix --yes\n```\n\n## Troubleshooting Updates\n\n### Issue: Update fails mid-process\n\n**Solution:**\n\n```\n# Check update status\nprovisioning t status --infra my-production\n\n# Resume failed update\nprovisioning t update --infra my-production --resume\n\n# Or rollback\nprovisioning t rollback --infra my-production\n```\n\n### Issue: Service not starting after update\n\n**Solution:**\n\n```\n# Check logs\nprovisioning logs --infra my-production\n\n# Verify configuration\nprovisioning t validate --infra my-production\n\n# Rollback if necessary\nprovisioning t rollback --infra my-production\n```\n\n### Issue: Data migration fails\n\n**Solution:**\n\n```\n# Check migration logs\nprovisioning t migration-logs --infra my-production\n\n# Restore from backup\nprovisioning t restore --infra my-production --from \n```\n\n## Best Practices\n\n1. **Always Test First**: Test updates in staging before production\n2. **Backup Everything**: Create backups before any update\n3. **Update Gradually**: Update one service at a time\n4. **Monitor Closely**: Watch for errors after each update\n5. **Have Rollback Plan**: Always have a rollback strategy\n6. **Document Changes**: Keep update logs for reference\n7. **Schedule Wisely**: Update during low-traffic periods\n8. **Verify Thoroughly**: Run tests after each update\n\n## Next Steps\n\n- **[Customize Guide](customize-infrastructure.md)** - Customize your infrastructure\n- **[From Scratch Guide](from-scratch.md)** - Deploy new infrastructure\n- **[Workflow Guide](../development/workflow.md)** - Automate with workflows\n\n## Quick Reference\n\n```\n# Update workflow\nprovisioning t check-updates\nprovisioning ws backup my-production\nprovisioning t create --infra my-production --check\nprovisioning t create --infra my-production\nprovisioning version taskserv \nprovisioning health --infra my-production\nprovisioning test all --infra my-production\n```\n\n---\n\n*This guide is part of the provisioning project documentation. Last updated: 2025-09-30*