19 KiB

Multi-Region High Availability Workspace\n\nThis workspace demonstrates a production-ready global high availability deployment spanning three cloud providers across three geographic regions:\n\n- US East (DigitalOcean NYC): Primary region - active serving, primary database\n- EU Central (Hetzner Germany): Secondary region - active serving, read replicas\n- Asia Pacific (AWS Singapore): Tertiary region - active serving, read replicas\n\n## Why Multi-Region High Availability?\n\n### Business Benefits\n\n- 99.99% Uptime: Automatic failover across regions\n- Low Latency: Users served from geographically closest region\n- Compliance: Data residency in specific regions (GDPR for EU)\n- Disaster Recovery: Complete regional failure tolerance\n\n### Technical Benefits\n\n- Load Distribution: Traffic spread across 3 regions\n- Cost Optimization: Pay only for actual usage (~$311/month)\n- Provider Diversity: Reduces vendor lock-in risk\n- Capacity Planning: Scale independently per region\n\n## Architecture Overview\n\n\n┌─────────────────────────────────────────────────────────────────┐\n│ Global Route53 DNS │\n│ Geographic Routing + Health Checks │\n└────────────────────┬────────────────────────────────────────────┘\n │\n ┌───────────┼───────────┐\n │ │ │\n ┌────▼─────┐ ┌──▼────────┐ ┌▼──────────┐\n │ US │ │ EU │ │ APAC │\n │ Primary │ │ Secondary │ │ Tertiary │\n └────┬─────┘ └──┬────────┘ └▼──────────┘\n │ │ │\n ┌────▼──────────▼───────────▼────┐\n │ Multi-Master Database │\n │ Replication (300s lag) │\n └────────────────────────────────┘\n │ │ │\n ┌────▼────┐ ┌──▼─────┐ ┌──▼────┐\n │DO Droplets Hetzner AWS\n │ 3 x nyc3 3 x nbg1 3 x sgp1\n │ │ │ │\n │ Load Balancer (per region)\n │ │ │ │\n └─────────┼─────────┼─────────┘\n │VPN Tunnels (IPSec)│\n └───────────────────┘\n\n\n### Regional Components\n\n#### US East (DigitalOcean) - Primary\n\n\nRegion: nyc3 (New York)\nCompute: 3x Droplets (s-2vcpu-4gb)\nLoad Balancer: Round-robin with health checks\nDatabase: PostgreSQL (3-node cluster, Multi-AZ)\nNetwork: VPC 10.0.0.0/16\nCost: ~$102/month\n\n\n#### EU Central (Hetzner) - Secondary\n\n\nRegion: nbg1 (Nuremberg, Germany)\nCompute: 3x CPX21 servers (4 vCPU, 8GB RAM)\nLoad Balancer: Hetzner Load Balancer\nDatabase: Read-only replica (lag: 300s)\nNetwork: vSwitch 10.1.0.0/16\nCost: ~$79/month (€72.70)\n\n\n#### Asia Pacific (AWS) - Tertiary\n\n\nRegion: ap-southeast-1 (Singapore)\nCompute: 3x EC2 t3.medium instances\nLoad Balancer: Application Load Balancer (ALB)\nDatabase: RDS read-only replica (lag: 300s)\nNetwork: VPC 10.2.0.0/16\nCost: ~$130/month\n\n\n## Prerequisites\n\n### 1. Cloud Accounts & Credentials\n\n#### DigitalOcean\n\n# Create API token\n# Dashboard → API → Tokens/Keys → Generate New Token\n# Scopes: read, write\n\nexport DIGITALOCEAN_TOKEN="dop_v1_abc123def456ghi789jkl012mno"\n\n\n#### Hetzner\n\n# Create API token\n# Dashboard → Security → API Tokens → Generate Token\n\nexport HCLOUD_TOKEN="MC4wNTI1YmE1M2E4YmE0YTQzMTQyZTdlODYy"\n\n\n#### AWS\n\n# Create IAM user with programmatic access\n# IAM → Users → Add User → Check "Programmatic access"\n# Attach policies: AmazonEC2FullAccess, AmazonRDSFullAccess, Route53FullAccess\n\nexport AWS_ACCESS_KEY_ID="AKIA1234567890ABCDEF"\nexport AWS_SECRET_ACCESS_KEY="wJalrXUtnFEMI/K7MDENG+j/zI0m1234567890ab"\n\n\n### 2. CLI Tools\n\n\n# Verify all CLIs are installed\nwhich doctl\nwhich hcloud\nwhich aws\nwhich nickel\n\n# Versions\ndoctl version # >= 1.94.0\nhcloud version # >= 1.35.0\naws --version # >= 2.0\nnickel --version # >= 1.0\n\n\n### 3. SSH Keys\n\n#### DigitalOcean\n\n# Upload SSH key\ndoctl compute ssh-key create provisioning-key \\n --public-key-from-file ~/.ssh/id_rsa.pub\n\n# Note the key ID\ndoctl compute ssh-key list\n\n\n#### Hetzner\n\n# Upload SSH key\nhcloud ssh-key create \\n --name provisioning-key \\n --public-key-from-file ~/.ssh/id_rsa.pub\n\n# List keys\nhcloud ssh-key list\n\n\n#### AWS\n\n# Create or import EC2 key pair\naws ec2 create-key-pair \\n --key-name provisioning-key \\n --query 'KeyMaterial' --output text > provisioning-key.pem\n\nchmod 600 provisioning-key.pem\n\n\n### 4. Domain and DNS\n\nYou need a domain with Route53 or ability to create DNS records:\n\n\n# Create hosted zone in Route53\naws route53 create-hosted-zone \\n --name api.example.com \\n --caller-reference $(date +%s)\n\n# Note the Zone ID for updates\naws route53 list-hosted-zones\n\n\n## Deployment\n\n### Step 1: Configure the Workspace\n\nEdit workspace.ncl to customize:\n\n\n# Update SSH key references\ndroplets = digitalocean.Droplet & {\n ssh_keys = ["YOUR_DO_KEY_ID"],\n name = "us-app",\n region = "nyc3"\n}\n\n# Update AWS AMI IDs for your region\napp_servers = aws.EC2 & {\n image_id = "ami-09d56f8956ab235b7",\n instance_type = "t3.medium",\n region = "ap-southeast-1"\n}\n\n# Update certificate ID\nload_balancer = digitalocean.LoadBalancer & {\n forwarding_rules = [{\n certificate_id = "your-certificate-id",\n entry_protocol = "https",\n entry_port = 443\n }]\n}\n\n\nEdit config.toml:\n\n\n# Update regional names if different\n[providers.digitalocean]\nregion_name = "us-east"\n\n[providers.hetzner]\nregion_name = "eu-central"\n\n[providers.aws]\nregion_name = "asia-southeast"\n\n# Update domain\n[dns]\ndomain = "api.example.com"\n\n\n### Step 2: Validate Configuration\n\n\n# Validate Nickel syntax\nnickel export workspace.ncl | jq . > /dev/null\n\n# Verify credentials per provider\ndoctl auth init --access-token $DIGITALOCEAN_TOKEN\nhcloud context use default\naws sts get-caller-identity\n\n# Check connectivity\ndoctl account get\nhcloud server list\naws ec2 describe-regions\n\n\n### Step 3: Deploy\n\n\n# Make script executable\nchmod +x deploy.nu\n\n# Execute deployment (step-by-step)\n./deploy.nu\n\n# Or with debug output\n./deploy.nu --debug\n\n# Or deploy per region\n./deploy.nu --region us-east\n./deploy.nu --region eu-central\n./deploy.nu --region asia-southeast\n\n\n### Step 4: Verify Global Deployment\n\n\n# List resources per region\necho "=== US EAST (DigitalOcean) ==="\ndoctl compute droplet list --format Name,Region,Status,PublicIPv4\ndoctl compute load-balancer list\n\necho "=== EU CENTRAL (Hetzner) ==="\nhcloud server list\n\necho "=== ASIA PACIFIC (AWS) ==="\naws ec2 describe-instances --region ap-southeast-1 \\n --query 'Reservations[*].Instances[*].[InstanceId,InstanceType,State.Name,PublicIpAddress]' \\n --output table\naws elbv2 describe-load-balancers --region ap-southeast-1\n\n\n## Post-Deployment Configuration\n\n### 1. SSL/TLS Certificates\n\n#### AWS Certificate Manager\n\n# Request certificate for all regions\naws acm request-certificate \\n --domain-name api.example.com \\n --subject-alternative-names *.api.example.com \\n --validation-method DNS \\n --region us-east-1\n\n# Get certificate ARN\naws acm list-certificates --region us-east-1\n\n# Note the ARN for workspace.ncl\n\n\n### 2. Database Primary/Replica Setup\n\n\n# Connect to US East primary\nPGPASSWORD=admin psql -h us-db-primary.abc123.us-east-1.rds.amazonaws.com -U admin -d postgres\n\n# Create read-only replication users for EU and APAC\nCREATE ROLE replication_user WITH REPLICATION LOGIN PASSWORD 'replica_password';\n\n# On EU read replica (Hetzner) - verify replication\nSELECT slot_name, restart_lsn, confirmed_flush_lsn FROM pg_replication_slots;\n\n# On APAC read replica (AWS RDS) - verify replica status\nSELECT databaseid, xmin, catalog_xmin FROM pg_replication_origin_status;\n\n\n### 3. Global DNS Setup\n\n\n# Create Route53 records for each region\naws route53 change-resource-record-sets \\n --hosted-zone-id Z1234567890ABC \\n --change-batch '{\n "Changes": [\n {\n "Action": "CREATE",\n "ResourceRecordSet": {\n "Name": "us.api.example.com",\n "Type": "A",\n "TTL": 60,\n "ResourceRecords": [{"Value": "198.51.100.15"}]\n }\n },\n {\n "Action": "CREATE",\n "ResourceRecordSet": {\n "Name": "eu.api.example.com",\n "Type": "A",\n "TTL": 60,\n "ResourceRecords": [{"Value": "192.0.2.100"}]\n }\n },\n {\n "Action": "CREATE",\n "ResourceRecordSet": {\n "Name": "asia.api.example.com",\n "Type": "A",\n "TTL": 60,\n "ResourceRecords": [{"Value": "203.0.113.50"}]\n }\n }\n ]\n }'\n\n# Health checks per region\naws route53 create-health-check \\n --health-check-config '{\n "Type": "HTTPS",\n "ResourcePath": "/health",\n "FullyQualifiedDomainName": "us.api.example.com",\n "Port": 443,\n "RequestInterval": 30,\n "FailureThreshold": 3\n }'\n\n\n### 4. Application Deployment\n\nSSH to web servers in each region:\n\n\n# US East\nUS_IP=$(doctl compute droplet get us-app-1 --format PublicIPv4 --no-header)\nssh root@$US_IP\n\n# Deploy application\ncd /var/www\ngit clone https://github.com/your-org/app.git\ncd app\n./deploy.sh\n\n# EU Central\nEU_IP=$(hcloud server list --selector region=eu-central --format ID | head -1 | xargs -I {} hcloud server ip {})\nssh root@$EU_IP\n\n# Asia Pacific\nASIA_IP=$(aws ec2 describe-instances \\n --region ap-southeast-1 \\n --filters "Name=tag:Name,Values=asia-app-1" \\n --query 'Reservations[0].Instances[0].PublicIpAddress' \\n --output text)\nssh -i provisioning-key.pem ec2-user@$ASIA_IP\n\n\n## Monitoring and Health Checks\n\n### Regional Monitoring\n\nEach region generates metrics to CloudWatch/provider-specific monitoring:\n\n\n# DigitalOcean metrics\ndoctl monitoring metrics list droplet \\n --droplet-id 123456789 \\n --metric cpu\n\n# Hetzner metrics (manual monitoring)\nhcloud server list\n\n# AWS CloudWatch\naws cloudwatch get-metric-statistics \\n --metric-name CPUUtilization \\n --namespace AWS/EC2 \\n --start-time 2024-01-01T00:00:00Z \\n --end-time 2024-01-02T00:00:00Z \\n --period 300 \\n --statistics Average\n\n\n### Global Health Checks\n\nRoute53 health checks verify all regions are healthy:\n\n\n# List health checks\naws route53 list-health-checks\n\n# Get detailed status\naws route53 get-health-check-status --health-check-id abc123\n\n# Verify replication lag\n# On primary (US East) DigitalOcean\nSELECT now() - pg_last_xact_replay_timestamp() AS replication_lag;\n\n# Should be less than 300 seconds\n\n\n### Alert Configuration\n\nConfigure alerts for critical metrics:\n\n\n# CPU > 80%\naws cloudwatch put-metric-alarm \\n --alarm-name us-east-high-cpu \\n --alarm-actions arn:aws:sns:us-east-1:123456:ops-alerts \\n --metric-name CPUUtilization \\n --threshold 80 \\n --comparison-operator GreaterThanThreshold\n\n# Replication lag > 600s\naws cloudwatch put-metric-alarm \\n --alarm-name replication-lag-critical \\n --metric-name ReplicationLag \\n --threshold 600 \\n --comparison-operator GreaterThanThreshold\n\n\n## Failover Testing\n\n### Planned Failover - US East to EU Central\n\n\n# 1. Stop traffic to US East\naws route53 change-resource-record-sets \\n --hosted-zone-id Z1234567890ABC \\n --change-batch '{\n "Changes": [{\n "Action": "UPSERT",\n "ResourceRecordSet": {\n "Name": "api.example.com",\n "Type": "A",\n "TTL": 60,\n "ResourceRecords": [{"Value": "192.0.2.100"}]\n }\n }]\n }'\n\n# 2. Promote EU Central to primary\n# Connect to EU read replica and promote\npsql -h hetzner-eu-db.netz.de -U admin -d postgres \\n -c "SELECT pg_promote();"\n\n# 3. Verify failover\ncurl https://api.example.com/health\n\n# 4. Monitor replication (now from EU)\nSELECT now() - pg_last_xact_replay_timestamp() AS replication_lag;\n\n\n### Automatic Failover - Health Check Failure\n\nRoute53 automatically fails over when health checks fail:\n\n\n# Simulate US East failure (for testing only)\n# Stop web servers temporarily\ndoctl compute droplet-action power-off us-app-1 us-app-2 us-app-3\n\n# Wait ~1 minute for health check to fail\nsleep 60\n\n# Verify traffic now routes to EU/APAC\ncurl https://api.example.com/ -v | grep -E "^< Server"\n\n# Restore US East\ndoctl compute droplet-action power-on us-app-1 us-app-2 us-app-3\n\n\n## Scaling and Upgrades\n\n### Add More Web Servers\n\nEdit workspace.ncl:\n\n\n# Increase droplet count\nregion_us_east.app_servers = digitalocean.Droplet & {\n count = 5,\n name = "us-app",\n region = "nyc3"\n}\n\n# Increase Hetzner servers\nregion_eu_central.app_servers = hetzner.Server & {\n count = 5,\n server_type = "cpx21",\n location = "nbg1"\n}\n\n# Increase AWS EC2 instances\nregion_asia_southeast.app_servers = aws.EC2 & {\n count = 5,\n instance_type = "t3.medium",\n region = "ap-southeast-1"\n}\n\n\nRedeploy:\n\n\n./deploy.nu --region us-east\n./deploy.nu --region eu-central\n./deploy.nu --region asia-southeast\n\n\n### Upgrade Database Instance Class\n\nEdit workspace.ncl:\n\n\n# US East primary\ndatabase = digitalocean.Database & {\n size = "db-s-4vcpu-8gb",\n name = "us-db-primary",\n engine = "pg"\n}\n\n\nDigitalOcean handles upgrade with minimal downtime.\n\n### Upgrade EC2 Instances\n\n\n# Stop instances for upgrade (rolling)\naws ec2 stop-instances --region ap-southeast-1 --instance-ids i-1234567890abcdef0\n\n# Wait for stop\naws ec2 wait instance-stopped --region ap-southeast-1 --instance-ids i-1234567890abcdef0\n\n# Modify instance type\naws ec2 modify-instance-attribute \\n --region ap-southeast-1 \\n --instance-id i-1234567890abcdef0 \\n --instance-type t3.large\n\n# Start instance\naws ec2 start-instances --region ap-southeast-1 --instance-ids i-1234567890abcdef0\n\n\n## Cost Optimization\n\n### Monthly Cost Breakdown\n\n| Component | US East | EU Central | Asia Pacific | Total |\n| ----------- | --------- | ----------- | -------------- | ------- |\n| Compute | $72 | €62.70 | $80 | $242.70 |\n| Database | $30 | Read Replica | $30 | $60 |\n| Load Balancer | Free | ~$10 | $20 | $30 |\n| Total | $102 | **$79** | $130 | **$311** |\n\n### Optimization Strategies\n\n1. Reduce instance count from 3 to 2 (saves ~$30-40/month)\n2. Downsize compute to s-1vcpu-2gb (saves ~$20-30/month)\n3. Use Reserved Instances on AWS (saves ~20-30%)\n4. Optimize data transfer between regions\n5. Review backups and retention settings\n\n### Monitor Costs\n\n\n# DigitalOcean\ndoctl billing get\n\n# AWS Cost Explorer\naws ce get-cost-and-usage \\n --time-period Start=2024-01-01,End=2024-01-31 \\n --granularity MONTHLY \\n --metrics BlendedCost \\n --group-by Type=DIMENSION,Key=SERVICE\n\n# Hetzner (manual via console)\n# https://console.hetzner.cloud/billing\n\n\n## Troubleshooting\n\n### Issue: One Region Not Responding\n\nDiagnosis:\n\n# Check health checks\naws route53 get-health-check-status --health-check-id abc123\n\n# Test regional endpoints\ncurl -v https://us.api.example.com/health\ncurl -v https://eu.api.example.com/health\ncurl -v https://asia.api.example.com/health\n\n\nSolution:\n- Check web server status in affected region\n- Verify load balancer is healthy\n- Review security groups/firewall rules\n- Check application logs on web servers\n\n### Issue: High Replication Lag\n\nDiagnosis:\n\n# Check replication status\npsql -h us-db-primary.abc123.us-east-1.rds.amazonaws.com -U admin -d postgres \\n -c "SELECT now() - pg_last_xact_replay_timestamp() AS replication_lag;"\n\n# Check replication slots\npsql -h us-db-primary.abc123.us-east-1.rds.amazonaws.com -U admin -d postgres \\n -c "SELECT * FROM pg_replication_slots;"\n\n\nSolution:\n- Check network connectivity between regions\n- Verify VPN tunnels are operational\n- Reduce write load on primary\n- Monitor network bandwidth\n- May need larger database instance\n\n### Issue: VPN Tunnel Down\n\nDiagnosis:\n\n# Check VPN connection status\naws ec2 describe-vpn-connections --region us-east-1\n\n# Test connectivity between regions\nssh hetzner-server "ping 10.0.0.1"\n\n\nSolution:\n- Reconnect VPN tunnel manually\n- Verify tunnel configuration\n- Check security groups allow necessary ports\n- Review ISP routing\n\n## Cleanup\n\nTo destroy all resources (use carefully):\n\n\n# DigitalOcean\ndoctl compute droplet delete --force us-app-1 us-app-2 us-app-3\ndoctl compute load-balancer delete --force us-lb\ndoctl compute database delete --force us-db-primary\n\n# Hetzner\nhcloud server delete hetzner-eu-1 hetzner-eu-2 hetzner-eu-3\nhcloud load-balancer delete eu-lb\nhcloud volume delete eu-backups\n\n# AWS\naws ec2 terminate-instances --region ap-southeast-1 --instance-ids i-xxxxx\naws elbv2 delete-load-balancer --load-balancer-arn arn:aws:elasticloadbalancing:ap-southeast-1:123456789:loadbalancer/app/asia-lb/1234567890abcdef\naws rds delete-db-instance --db-instance-identifier asia-db-replica --skip-final-snapshot\n\n# Route53\naws route53 delete-health-check --health-check-id abc123\naws route53 delete-hosted-zone --id Z1234567890ABC\n\n\n## Next Steps\n\n1. Disaster Recovery Testing: Regular failover drills\n2. Auto-scaling: Add provider-specific autoscaling\n3. Monitoring Integration: Connect to centralized monitoring (Datadog, New Relic, Prometheus)\n4. Backup Automation: Implement cross-region backups\n5. Cost Optimization: Review and tune resource sizing\n6. Security Hardening: Implement WAF, DDoS protection\n7. Load Testing: Validate performance across regions\n\n## Support\n\nFor issues or questions:\n\n- Review the multi-provider networking guide\n- Check provider-specific documentation\n- Review regional deployment logs: ./deploy.nu --debug\n- Test regional endpoints independently\n\n## Files\n\n- workspace.ncl: Global infrastructure definition (Nickel)\n- config.toml: Provider credentials and regional settings\n- deploy.nu: Multi-region deployment orchestration (Nushell)\n- README.md: This file