# Multi-Region High Availability Workspace This workspace demonstrates a production-ready global high availability deployment spanning three cloud providers across three geographic regions: - **US East (DigitalOcean NYC)**: Primary region - active serving, primary database - **EU Central (Hetzner Germany)**: Secondary region - active serving, read replicas - **Asia Pacific (AWS Singapore)**: Tertiary region - active serving, read replicas ## Why Multi-Region High Availability? ### Business Benefits - **99.99% Uptime**: Automatic failover across regions - **Low Latency**: Users served from geographically closest region - **Compliance**: Data residency in specific regions (GDPR for EU) - **Disaster Recovery**: Complete regional failure tolerance ### Technical Benefits - **Load Distribution**: Traffic spread across 3 regions - **Cost Optimization**: Pay only for actual usage (~$311/month) - **Provider Diversity**: Reduces vendor lock-in risk - **Capacity Planning**: Scale independently per region ## Architecture Overview ``` ┌─────────────────────────────────────────────────────────────────┐ │ Global Route53 DNS │ │ Geographic Routing + Health Checks │ └────────────────────┬────────────────────────────────────────────┘ │ ┌───────────┼───────────┐ │ │ │ ┌────▼─────┐ ┌──▼────────┐ ┌▼──────────┐ │ US │ │ EU │ │ APAC │ │ Primary │ │ Secondary │ │ Tertiary │ └────┬─────┘ └──┬────────┘ └▼──────────┘ │ │ │ ┌────▼──────────▼───────────▼────┐ │ Multi-Master Database │ │ Replication (300s lag) │ └────────────────────────────────┘ │ │ │ ┌────▼────┐ ┌──▼─────┐ ┌──▼────┐ │DO Droplets Hetzner AWS │ 3 x nyc3 3 x nbg1 3 x sgp1 │ │ │ │ │ Load Balancer (per region) │ │ │ │ └─────────┼─────────┼─────────┘ │VPN Tunnels (IPSec)│ └───────────────────┘ ``` ### Regional Components #### US East (DigitalOcean) - Primary ``` Region: nyc3 (New York) Compute: 3x Droplets (s-2vcpu-4gb) Load Balancer: Round-robin with health checks Database: PostgreSQL (3-node cluster, Multi-AZ) Network: VPC 10.0.0.0/16 Cost: ~$102/month ``` #### EU Central (Hetzner) - Secondary ``` Region: nbg1 (Nuremberg, Germany) Compute: 3x CPX21 servers (4 vCPU, 8GB RAM) Load Balancer: Hetzner Load Balancer Database: Read-only replica (lag: 300s) Network: vSwitch 10.1.0.0/16 Cost: ~$79/month (€72.70) ``` #### Asia Pacific (AWS) - Tertiary ``` Region: ap-southeast-1 (Singapore) Compute: 3x EC2 t3.medium instances Load Balancer: Application Load Balancer (ALB) Database: RDS read-only replica (lag: 300s) Network: VPC 10.2.0.0/16 Cost: ~$130/month ``` ## Prerequisites ### 1. Cloud Accounts & Credentials #### DigitalOcean ```bash # Create API token # Dashboard → API → Tokens/Keys → Generate New Token # Scopes: read, write export DIGITALOCEAN_TOKEN="dop_v1_abc123def456ghi789jkl012mno" ``` #### Hetzner ```bash # Create API token # Dashboard → Security → API Tokens → Generate Token export HCLOUD_TOKEN="MC4wNTI1YmE1M2E4YmE0YTQzMTQyZTdlODYy" ``` #### AWS ```bash # Create IAM user with programmatic access # IAM → Users → Add User → Check "Programmatic access" # Attach policies: AmazonEC2FullAccess, AmazonRDSFullAccess, Route53FullAccess export AWS_ACCESS_KEY_ID="AKIA1234567890ABCDEF" export AWS_SECRET_ACCESS_KEY="wJalrXUtnFEMI/K7MDENG+j/zI0m1234567890ab" ``` ### 2. CLI Tools ```bash # Verify all CLIs are installed which doctl which hcloud which aws which nickel # Versions doctl version # >= 1.94.0 hcloud version # >= 1.35.0 aws --version # >= 2.0 nickel --version # >= 1.0 ``` ### 3. SSH Keys #### DigitalOcean ```bash # Upload SSH key doctl compute ssh-key create provisioning-key \ --public-key-from-file ~/.ssh/id_rsa.pub # Note the key ID doctl compute ssh-key list ``` #### Hetzner ```bash # Upload SSH key hcloud ssh-key create \ --name provisioning-key \ --public-key-from-file ~/.ssh/id_rsa.pub # List keys hcloud ssh-key list ``` #### AWS ```bash # Create or import EC2 key pair aws ec2 create-key-pair \ --key-name provisioning-key \ --query 'KeyMaterial' --output text > provisioning-key.pem chmod 600 provisioning-key.pem ``` ### 4. Domain and DNS You need a domain with Route53 or ability to create DNS records: ```bash # Create hosted zone in Route53 aws route53 create-hosted-zone \ --name api.example.com \ --caller-reference $(date +%s) # Note the Zone ID for updates aws route53 list-hosted-zones ``` ## Deployment ### Step 1: Configure the Workspace Edit `workspace.ncl` to customize: ```nickel # Update SSH key references droplets = digitalocean.Droplet & { ssh_keys = ["YOUR_DO_KEY_ID"], name = "us-app", region = "nyc3" } # Update AWS AMI IDs for your region app_servers = aws.EC2 & { image_id = "ami-09d56f8956ab235b7", instance_type = "t3.medium", region = "ap-southeast-1" } # Update certificate ID load_balancer = digitalocean.LoadBalancer & { forwarding_rules = [{ certificate_id = "your-certificate-id", entry_protocol = "https", entry_port = 443 }] } ``` Edit `config.toml`: ```toml # Update regional names if different [providers.digitalocean] region_name = "us-east" [providers.hetzner] region_name = "eu-central" [providers.aws] region_name = "asia-southeast" # Update domain [dns] domain = "api.example.com" ``` ### Step 2: Validate Configuration ```bash # Validate Nickel syntax nickel export workspace.ncl | jq . > /dev/null # Verify credentials per provider doctl auth init --access-token $DIGITALOCEAN_TOKEN hcloud context use default aws sts get-caller-identity # Check connectivity doctl account get hcloud server list aws ec2 describe-regions ``` ### Step 3: Deploy ```bash # Make script executable chmod +x deploy.nu # Execute deployment (step-by-step) ./deploy.nu # Or with debug output ./deploy.nu --debug # Or deploy per region ./deploy.nu --region us-east ./deploy.nu --region eu-central ./deploy.nu --region asia-southeast ``` ### Step 4: Verify Global Deployment ```bash # List resources per region echo "=== US EAST (DigitalOcean) ===" doctl compute droplet list --format Name,Region,Status,PublicIPv4 doctl compute load-balancer list echo "=== EU CENTRAL (Hetzner) ===" hcloud server list echo "=== ASIA PACIFIC (AWS) ===" aws ec2 describe-instances --region ap-southeast-1 \ --query 'Reservations[*].Instances[*].[InstanceId,InstanceType,State.Name,PublicIpAddress]' \ --output table aws elbv2 describe-load-balancers --region ap-southeast-1 ``` ## Post-Deployment Configuration ### 1. SSL/TLS Certificates #### AWS Certificate Manager ```bash # Request certificate for all regions aws acm request-certificate \ --domain-name api.example.com \ --subject-alternative-names *.api.example.com \ --validation-method DNS \ --region us-east-1 # Get certificate ARN aws acm list-certificates --region us-east-1 # Note the ARN for workspace.ncl ``` ### 2. Database Primary/Replica Setup ```bash # Connect to US East primary PGPASSWORD=admin psql -h us-db-primary.abc123.us-east-1.rds.amazonaws.com -U admin -d postgres # Create read-only replication users for EU and APAC CREATE ROLE replication_user WITH REPLICATION LOGIN PASSWORD 'replica_password'; # On EU read replica (Hetzner) - verify replication SELECT slot_name, restart_lsn, confirmed_flush_lsn FROM pg_replication_slots; # On APAC read replica (AWS RDS) - verify replica status SELECT databaseid, xmin, catalog_xmin FROM pg_replication_origin_status; ``` ### 3. Global DNS Setup ```bash # Create Route53 records for each region aws route53 change-resource-record-sets \ --hosted-zone-id Z1234567890ABC \ --change-batch '{ "Changes": [ { "Action": "CREATE", "ResourceRecordSet": { "Name": "us.api.example.com", "Type": "A", "TTL": 60, "ResourceRecords": [{"Value": "198.51.100.15"}] } }, { "Action": "CREATE", "ResourceRecordSet": { "Name": "eu.api.example.com", "Type": "A", "TTL": 60, "ResourceRecords": [{"Value": "192.0.2.100"}] } }, { "Action": "CREATE", "ResourceRecordSet": { "Name": "asia.api.example.com", "Type": "A", "TTL": 60, "ResourceRecords": [{"Value": "203.0.113.50"}] } } ] }' # Health checks per region aws route53 create-health-check \ --health-check-config '{ "Type": "HTTPS", "ResourcePath": "/health", "FullyQualifiedDomainName": "us.api.example.com", "Port": 443, "RequestInterval": 30, "FailureThreshold": 3 }' ``` ### 4. Application Deployment SSH to web servers in each region: ```bash # US East US_IP=$(doctl compute droplet get us-app-1 --format PublicIPv4 --no-header) ssh root@$US_IP # Deploy application cd /var/www git clone https://github.com/your-org/app.git cd app ./deploy.sh # EU Central EU_IP=$(hcloud server list --selector region=eu-central --format ID | head -1 | xargs -I {} hcloud server ip {}) ssh root@$EU_IP # Asia Pacific ASIA_IP=$(aws ec2 describe-instances \ --region ap-southeast-1 \ --filters "Name=tag:Name,Values=asia-app-1" \ --query 'Reservations[0].Instances[0].PublicIpAddress' \ --output text) ssh -i provisioning-key.pem ec2-user@$ASIA_IP ``` ## Monitoring and Health Checks ### Regional Monitoring Each region generates metrics to CloudWatch/provider-specific monitoring: ```bash # DigitalOcean metrics doctl monitoring metrics list droplet \ --droplet-id 123456789 \ --metric cpu # Hetzner metrics (manual monitoring) hcloud server list # AWS CloudWatch aws cloudwatch get-metric-statistics \ --metric-name CPUUtilization \ --namespace AWS/EC2 \ --start-time 2024-01-01T00:00:00Z \ --end-time 2024-01-02T00:00:00Z \ --period 300 \ --statistics Average ``` ### Global Health Checks Route53 health checks verify all regions are healthy: ```bash # List health checks aws route53 list-health-checks # Get detailed status aws route53 get-health-check-status --health-check-id abc123 # Verify replication lag # On primary (US East) DigitalOcean SELECT now() - pg_last_xact_replay_timestamp() AS replication_lag; # Should be less than 300 seconds ``` ### Alert Configuration Configure alerts for critical metrics: ```bash # CPU > 80% aws cloudwatch put-metric-alarm \ --alarm-name us-east-high-cpu \ --alarm-actions arn:aws:sns:us-east-1:123456:ops-alerts \ --metric-name CPUUtilization \ --threshold 80 \ --comparison-operator GreaterThanThreshold # Replication lag > 600s aws cloudwatch put-metric-alarm \ --alarm-name replication-lag-critical \ --metric-name ReplicationLag \ --threshold 600 \ --comparison-operator GreaterThanThreshold ``` ## Failover Testing ### Planned Failover - US East to EU Central ```bash # 1. Stop traffic to US East aws route53 change-resource-record-sets \ --hosted-zone-id Z1234567890ABC \ --change-batch '{ "Changes": [{ "Action": "UPSERT", "ResourceRecordSet": { "Name": "api.example.com", "Type": "A", "TTL": 60, "ResourceRecords": [{"Value": "192.0.2.100"}] } }] }' # 2. Promote EU Central to primary # Connect to EU read replica and promote psql -h hetzner-eu-db.netz.de -U admin -d postgres \ -c "SELECT pg_promote();" # 3. Verify failover curl https://api.example.com/health # 4. Monitor replication (now from EU) SELECT now() - pg_last_xact_replay_timestamp() AS replication_lag; ``` ### Automatic Failover - Health Check Failure Route53 automatically fails over when health checks fail: ```bash # Simulate US East failure (for testing only) # Stop web servers temporarily doctl compute droplet-action power-off us-app-1 us-app-2 us-app-3 # Wait ~1 minute for health check to fail sleep 60 # Verify traffic now routes to EU/APAC curl https://api.example.com/ -v | grep -E "^< Server" # Restore US East doctl compute droplet-action power-on us-app-1 us-app-2 us-app-3 ``` ## Scaling and Upgrades ### Add More Web Servers Edit `workspace.ncl`: ```nickel # Increase droplet count region_us_east.app_servers = digitalocean.Droplet & { count = 5, name = "us-app", region = "nyc3" } # Increase Hetzner servers region_eu_central.app_servers = hetzner.Server & { count = 5, server_type = "cpx21", location = "nbg1" } # Increase AWS EC2 instances region_asia_southeast.app_servers = aws.EC2 & { count = 5, instance_type = "t3.medium", region = "ap-southeast-1" } ``` Redeploy: ```bash ./deploy.nu --region us-east ./deploy.nu --region eu-central ./deploy.nu --region asia-southeast ``` ### Upgrade Database Instance Class Edit `workspace.ncl`: ```nickel # US East primary database = digitalocean.Database & { size = "db-s-4vcpu-8gb", name = "us-db-primary", engine = "pg" } ``` DigitalOcean handles upgrade with minimal downtime. ### Upgrade EC2 Instances ```bash # Stop instances for upgrade (rolling) aws ec2 stop-instances --region ap-southeast-1 --instance-ids i-1234567890abcdef0 # Wait for stop aws ec2 wait instance-stopped --region ap-southeast-1 --instance-ids i-1234567890abcdef0 # Modify instance type aws ec2 modify-instance-attribute \ --region ap-southeast-1 \ --instance-id i-1234567890abcdef0 \ --instance-type t3.large # Start instance aws ec2 start-instances --region ap-southeast-1 --instance-ids i-1234567890abcdef0 ``` ## Cost Optimization ### Monthly Cost Breakdown | Component | US East | EU Central | Asia Pacific | Total | |-----------|---------|-----------|--------------|-------| | Compute | $72 | €62.70 | $80 | $242.70 | | Database | $30 | Read Replica | $30 | $60 | | Load Balancer | Free | ~$10 | ~$20 | ~$30 | | **Total** | **$102** | **~$79** | **$130** | **~$311** | ### Optimization Strategies 1. Reduce instance count from 3 to 2 (saves ~$30-40/month) 2. Downsize compute to s-1vcpu-2gb (saves ~$20-30/month) 3. Use Reserved Instances on AWS (saves ~20-30%) 4. Optimize data transfer between regions 5. Review backups and retention settings ### Monitor Costs ```bash # DigitalOcean doctl billing get # AWS Cost Explorer aws ce get-cost-and-usage \ --time-period Start=2024-01-01,End=2024-01-31 \ --granularity MONTHLY \ --metrics BlendedCost \ --group-by Type=DIMENSION,Key=SERVICE # Hetzner (manual via console) # https://console.hetzner.cloud/billing ``` ## Troubleshooting ### Issue: One Region Not Responding **Diagnosis**: ```bash # Check health checks aws route53 get-health-check-status --health-check-id abc123 # Test regional endpoints curl -v https://us.api.example.com/health curl -v https://eu.api.example.com/health curl -v https://asia.api.example.com/health ``` **Solution**: - Check web server status in affected region - Verify load balancer is healthy - Review security groups/firewall rules - Check application logs on web servers ### Issue: High Replication Lag **Diagnosis**: ```bash # Check replication status psql -h us-db-primary.abc123.us-east-1.rds.amazonaws.com -U admin -d postgres \ -c "SELECT now() - pg_last_xact_replay_timestamp() AS replication_lag;" # Check replication slots psql -h us-db-primary.abc123.us-east-1.rds.amazonaws.com -U admin -d postgres \ -c "SELECT * FROM pg_replication_slots;" ``` **Solution**: - Check network connectivity between regions - Verify VPN tunnels are operational - Reduce write load on primary - Monitor network bandwidth - May need larger database instance ### Issue: VPN Tunnel Down **Diagnosis**: ```bash # Check VPN connection status aws ec2 describe-vpn-connections --region us-east-1 # Test connectivity between regions ssh hetzner-server "ping 10.0.0.1" ``` **Solution**: - Reconnect VPN tunnel manually - Verify tunnel configuration - Check security groups allow necessary ports - Review ISP routing ## Cleanup To destroy all resources (use carefully): ```bash # DigitalOcean doctl compute droplet delete --force us-app-1 us-app-2 us-app-3 doctl compute load-balancer delete --force us-lb doctl compute database delete --force us-db-primary # Hetzner hcloud server delete hetzner-eu-1 hetzner-eu-2 hetzner-eu-3 hcloud load-balancer delete eu-lb hcloud volume delete eu-backups # AWS aws ec2 terminate-instances --region ap-southeast-1 --instance-ids i-xxxxx aws elbv2 delete-load-balancer --load-balancer-arn arn:aws:elasticloadbalancing:ap-southeast-1:123456789:loadbalancer/app/asia-lb/1234567890abcdef aws rds delete-db-instance --db-instance-identifier asia-db-replica --skip-final-snapshot # Route53 aws route53 delete-health-check --health-check-id abc123 aws route53 delete-hosted-zone --id Z1234567890ABC ``` ## Next Steps 1. Disaster Recovery Testing: Regular failover drills 2. Auto-scaling: Add provider-specific autoscaling 3. Monitoring Integration: Connect to centralized monitoring (Datadog, New Relic, Prometheus) 4. Backup Automation: Implement cross-region backups 5. Cost Optimization: Review and tune resource sizing 6. Security Hardening: Implement WAF, DDoS protection 7. Load Testing: Validate performance across regions ## Support For issues or questions: - Review the multi-provider networking guide - Check provider-specific documentation - Review regional deployment logs: `./deploy.nu --debug` - Test regional endpoints independently ## Files - `workspace.ncl`: Global infrastructure definition (Nickel) - `config.toml`: Provider credentials and regional settings - `deploy.nu`: Multi-region deployment orchestration (Nushell) - `README.md`: This file