Multi-Region High Availability Workspace

This workspace demonstrates a production-ready global high availability deployment spanning three cloud providers across three geographic regions:

  • US East (DigitalOcean NYC): Primary region - active serving, primary database
  • EU Central (Hetzner Germany): Secondary region - active serving, read replicas
  • Asia Pacific (AWS Singapore): Tertiary region - active serving, read replicas

Why Multi-Region High Availability?

Business Benefits

  • 99.99% Uptime: Automatic failover across regions
  • Low Latency: Users served from geographically closest region
  • Compliance: Data residency in specific regions (GDPR for EU)
  • Disaster Recovery: Complete regional failure tolerance

Technical Benefits

  • Load Distribution: Traffic spread across 3 regions
  • Cost Optimization: Pay only for actual usage (~$311/month)
  • Provider Diversity: Reduces vendor lock-in risk
  • Capacity Planning: Scale independently per region

Architecture Overview

┌─────────────────────────────────────────────────────────────────┐
│                   Global Route53 DNS                              │
│            Geographic Routing + Health Checks                    │
└────────────────────┬────────────────────────────────────────────┘
                     │
         ┌───────────┼───────────┐
         │           │           │
    ┌────▼─────┐ ┌──▼────────┐ ┌▼──────────┐
    │   US     │ │    EU     │ │   APAC    │
    │ Primary  │ │ Secondary │ │ Tertiary  │
    └────┬─────┘ └──┬────────┘ └▼──────────┘
         │          │           │
    ┌────▼──────────▼───────────▼────┐
    │   Multi-Master Database         │
    │   Replication (300s lag)        │
    └────────────────────────────────┘
         │          │           │
    ┌────▼────┐ ┌──▼─────┐ ┌──▼────┐
    │DO Droplets     Hetzner         AWS
    │  3 x nyc3   3 x nbg1   3 x sgp1
    │         │         │         │
    │   Load Balancer (per region)
    │         │         │         │
    └─────────┼─────────┼─────────┘
              │VPN Tunnels (IPSec)│
              └───────────────────┘

Regional Components

US East (DigitalOcean) - Primary

Region: nyc3 (New York)
Compute: 3x Droplets (s-2vcpu-4gb)
Load Balancer: Round-robin with health checks
Database: PostgreSQL (3-node cluster, Multi-AZ)
Network: VPC 10.0.0.0/16
Cost: ~$102/month

EU Central (Hetzner) - Secondary

Region: nbg1 (Nuremberg, Germany)
Compute: 3x CPX21 servers (4 vCPU, 8GB RAM)
Load Balancer: Hetzner Load Balancer
Database: Read-only replica (lag: 300s)
Network: vSwitch 10.1.0.0/16
Cost: ~$79/month (€72.70)

Asia Pacific (AWS) - Tertiary

Region: ap-southeast-1 (Singapore)
Compute: 3x EC2 t3.medium instances
Load Balancer: Application Load Balancer (ALB)
Database: RDS read-only replica (lag: 300s)
Network: VPC 10.2.0.0/16
Cost: ~$130/month

Prerequisites

1. Cloud Accounts & Credentials

DigitalOcean

# Create API token
# Dashboard → API → Tokens/Keys → Generate New Token
# Scopes: read, write

export DIGITALOCEAN_TOKEN="dop_v1_abc123def456ghi789jkl012mno"

Hetzner

# Create API token
# Dashboard → Security → API Tokens → Generate Token

export HCLOUD_TOKEN="MC4wNTI1YmE1M2E4YmE0YTQzMTQyZTdlODYy"

AWS

# Create IAM user with programmatic access
# IAM → Users → Add User → Check "Programmatic access"
# Attach policies: AmazonEC2FullAccess, AmazonRDSFullAccess, Route53FullAccess

export AWS_ACCESS_KEY_ID="AKIA1234567890ABCDEF"
export AWS_SECRET_ACCESS_KEY="wJalrXUtnFEMI/K7MDENG+j/zI0m1234567890ab"

2. CLI Tools

# Verify all CLIs are installed
which doctl
which hcloud
which aws
which nickel

# Versions
doctl version          # >= 1.94.0
hcloud version         # >= 1.35.0
aws --version          # >= 2.0
nickel --version       # >= 1.0

3. SSH Keys

DigitalOcean

# Upload SSH key
doctl compute ssh-key create provisioning-key \
  --public-key-from-file ~/.ssh/id_rsa.pub

# Note the key ID
doctl compute ssh-key list

Hetzner

# Upload SSH key
hcloud ssh-key create \
  --name provisioning-key \
  --public-key-from-file ~/.ssh/id_rsa.pub

# List keys
hcloud ssh-key list

AWS

# Create or import EC2 key pair
aws ec2 create-key-pair \
  --key-name provisioning-key \
  --query 'KeyMaterial' --output text > provisioning-key.pem

chmod 600 provisioning-key.pem

4. Domain and DNS

You need a domain with Route53 or ability to create DNS records:

# Create hosted zone in Route53
aws route53 create-hosted-zone \
  --name api.example.com \
  --caller-reference $(date +%s)

# Note the Zone ID for updates
aws route53 list-hosted-zones

Deployment

Step 1: Configure the Workspace

Edit workspace.ncl to customize:

# Update SSH key references
droplets = digitalocean.Droplet & {
  ssh_keys = ["YOUR_DO_KEY_ID"],
  name = "us-app",
  region = "nyc3"
}

# Update AWS AMI IDs for your region
app_servers = aws.EC2 & {
  image_id = "ami-09d56f8956ab235b7",
  instance_type = "t3.medium",
  region = "ap-southeast-1"
}

# Update certificate ID
load_balancer = digitalocean.LoadBalancer & {
  forwarding_rules = [{
    certificate_id = "your-certificate-id",
    entry_protocol = "https",
    entry_port = 443
  }]
}

Edit config.toml:

# Update regional names if different
[providers.digitalocean]
region_name = "us-east"

[providers.hetzner]
region_name = "eu-central"

[providers.aws]
region_name = "asia-southeast"

# Update domain
[dns]
domain = "api.example.com"

Step 2: Validate Configuration

# Validate Nickel syntax
nickel export workspace.ncl | jq . > /dev/null

# Verify credentials per provider
doctl auth init --access-token $DIGITALOCEAN_TOKEN
hcloud context use default
aws sts get-caller-identity

# Check connectivity
doctl account get
hcloud server list
aws ec2 describe-regions

Step 3: Deploy

# Make script executable
chmod +x deploy.nu

# Execute deployment (step-by-step)
./deploy.nu

# Or with debug output
./deploy.nu --debug

# Or deploy per region
./deploy.nu --region us-east
./deploy.nu --region eu-central
./deploy.nu --region asia-southeast

Step 4: Verify Global Deployment

# List resources per region
echo "=== US EAST (DigitalOcean) ==="
doctl compute droplet list --format Name,Region,Status,PublicIPv4
doctl compute load-balancer list

echo "=== EU CENTRAL (Hetzner) ==="
hcloud server list

echo "=== ASIA PACIFIC (AWS) ==="
aws ec2 describe-instances --region ap-southeast-1 \
  --query 'Reservations[*].Instances[*].[InstanceId,InstanceType,State.Name,PublicIpAddress]' \
  --output table
aws elbv2 describe-load-balancers --region ap-southeast-1

Post-Deployment Configuration

1. SSL/TLS Certificates

AWS Certificate Manager

# Request certificate for all regions
aws acm request-certificate \
  --domain-name api.example.com \
  --subject-alternative-names *.api.example.com \
  --validation-method DNS \
  --region us-east-1

# Get certificate ARN
aws acm list-certificates --region us-east-1

# Note the ARN for workspace.ncl

2. Database Primary/Replica Setup

# Connect to US East primary
PGPASSWORD=admin psql -h us-db-primary.abc123.us-east-1.rds.amazonaws.com -U admin -d postgres

# Create read-only replication users for EU and APAC
CREATE ROLE replication_user WITH REPLICATION LOGIN PASSWORD 'replica_password';

# On EU read replica (Hetzner) - verify replication
SELECT slot_name, restart_lsn, confirmed_flush_lsn FROM pg_replication_slots;

# On APAC read replica (AWS RDS) - verify replica status
SELECT databaseid, xmin, catalog_xmin FROM pg_replication_origin_status;

3. Global DNS Setup

# Create Route53 records for each region
aws route53 change-resource-record-sets \
  --hosted-zone-id Z1234567890ABC \
  --change-batch '{
    "Changes": [
      {
        "Action": "CREATE",
        "ResourceRecordSet": {
          "Name": "us.api.example.com",
          "Type": "A",
          "TTL": 60,
          "ResourceRecords": [{"Value": "198.51.100.15"}]
        }
      },
      {
        "Action": "CREATE",
        "ResourceRecordSet": {
          "Name": "eu.api.example.com",
          "Type": "A",
          "TTL": 60,
          "ResourceRecords": [{"Value": "192.0.2.100"}]
        }
      },
      {
        "Action": "CREATE",
        "ResourceRecordSet": {
          "Name": "asia.api.example.com",
          "Type": "A",
          "TTL": 60,
          "ResourceRecords": [{"Value": "203.0.113.50"}]
        }
      }
    ]
  }'

# Health checks per region
aws route53 create-health-check \
  --health-check-config '{
    "Type": "HTTPS",
    "ResourcePath": "/health",
    "FullyQualifiedDomainName": "us.api.example.com",
    "Port": 443,
    "RequestInterval": 30,
    "FailureThreshold": 3
  }'

4. Application Deployment

SSH to web servers in each region:

# US East
US_IP=$(doctl compute droplet get us-app-1 --format PublicIPv4 --no-header)
ssh root@$US_IP

# Deploy application
cd /var/www
git clone https://github.com/your-org/app.git
cd app
./deploy.sh

# EU Central
EU_IP=$(hcloud server list --selector region=eu-central --format ID | head -1 | xargs -I {} hcloud server ip {})
ssh root@$EU_IP

# Asia Pacific
ASIA_IP=$(aws ec2 describe-instances \
  --region ap-southeast-1 \
  --filters "Name=tag:Name,Values=asia-app-1" \
  --query 'Reservations[0].Instances[0].PublicIpAddress' \
  --output text)
ssh -i provisioning-key.pem ec2-user@$ASIA_IP

Monitoring and Health Checks

Regional Monitoring

Each region generates metrics to CloudWatch/provider-specific monitoring:

# DigitalOcean metrics
doctl monitoring metrics list droplet \
  --droplet-id 123456789 \
  --metric cpu

# Hetzner metrics (manual monitoring)
hcloud server list

# AWS CloudWatch
aws cloudwatch get-metric-statistics \
  --metric-name CPUUtilization \
  --namespace AWS/EC2 \
  --start-time 2024-01-01T00:00:00Z \
  --end-time 2024-01-02T00:00:00Z \
  --period 300 \
  --statistics Average

Global Health Checks

Route53 health checks verify all regions are healthy:

# List health checks
aws route53 list-health-checks

# Get detailed status
aws route53 get-health-check-status --health-check-id abc123

# Verify replication lag
# On primary (US East) DigitalOcean
SELECT now() - pg_last_xact_replay_timestamp() AS replication_lag;

# Should be less than 300 seconds

Alert Configuration

Configure alerts for critical metrics:

# CPU > 80%
aws cloudwatch put-metric-alarm \
  --alarm-name us-east-high-cpu \
  --alarm-actions arn:aws:sns:us-east-1:123456:ops-alerts \
  --metric-name CPUUtilization \
  --threshold 80 \
  --comparison-operator GreaterThanThreshold

# Replication lag > 600s
aws cloudwatch put-metric-alarm \
  --alarm-name replication-lag-critical \
  --metric-name ReplicationLag \
  --threshold 600 \
  --comparison-operator GreaterThanThreshold

Failover Testing

Planned Failover - US East to EU Central

# 1. Stop traffic to US East
aws route53 change-resource-record-sets \
  --hosted-zone-id Z1234567890ABC \
  --change-batch '{
    "Changes": [{
      "Action": "UPSERT",
      "ResourceRecordSet": {
        "Name": "api.example.com",
        "Type": "A",
        "TTL": 60,
        "ResourceRecords": [{"Value": "192.0.2.100"}]
      }
    }]
  }'

# 2. Promote EU Central to primary
# Connect to EU read replica and promote
psql -h hetzner-eu-db.netz.de -U admin -d postgres \
  -c "SELECT pg_promote();"

# 3. Verify failover
curl https://api.example.com/health

# 4. Monitor replication (now from EU)
SELECT now() - pg_last_xact_replay_timestamp() AS replication_lag;

Automatic Failover - Health Check Failure

Route53 automatically fails over when health checks fail:

# Simulate US East failure (for testing only)
# Stop web servers temporarily
doctl compute droplet-action power-off us-app-1 us-app-2 us-app-3

# Wait ~1 minute for health check to fail
sleep 60

# Verify traffic now routes to EU/APAC
curl https://api.example.com/ -v | grep -E "^< Server"

# Restore US East
doctl compute droplet-action power-on us-app-1 us-app-2 us-app-3

Scaling and Upgrades

Add More Web Servers

Edit workspace.ncl:

# Increase droplet count
region_us_east.app_servers = digitalocean.Droplet & {
  count = 5,
  name = "us-app",
  region = "nyc3"
}

# Increase Hetzner servers
region_eu_central.app_servers = hetzner.Server & {
  count = 5,
  server_type = "cpx21",
  location = "nbg1"
}

# Increase AWS EC2 instances
region_asia_southeast.app_servers = aws.EC2 & {
  count = 5,
  instance_type = "t3.medium",
  region = "ap-southeast-1"
}

Redeploy:

./deploy.nu --region us-east
./deploy.nu --region eu-central
./deploy.nu --region asia-southeast

Upgrade Database Instance Class

Edit workspace.ncl:

# US East primary
database = digitalocean.Database & {
  size = "db-s-4vcpu-8gb",
  name = "us-db-primary",
  engine = "pg"
}

DigitalOcean handles upgrade with minimal downtime.

Upgrade EC2 Instances

# Stop instances for upgrade (rolling)
aws ec2 stop-instances --region ap-southeast-1 --instance-ids i-1234567890abcdef0

# Wait for stop
aws ec2 wait instance-stopped --region ap-southeast-1 --instance-ids i-1234567890abcdef0

# Modify instance type
aws ec2 modify-instance-attribute \
  --region ap-southeast-1 \
  --instance-id i-1234567890abcdef0 \
  --instance-type t3.large

# Start instance
aws ec2 start-instances --region ap-southeast-1 --instance-ids i-1234567890abcdef0

Cost Optimization

Monthly Cost Breakdown

Component US East EU Central Asia Pacific Total
Compute $72 €62.70 $80 $242.70
Database $30 Read Replica $30 $60
Load Balancer Free ~$10 ~$20 ~$30
Total $102 ~$79 $130 ~$311

Optimization Strategies

  1. Reduce instance count from 3 to 2 (saves ~$30-40/month)
  2. Downsize compute to s-1vcpu-2gb (saves ~$20-30/month)
  3. Use Reserved Instances on AWS (saves ~20-30%)
  4. Optimize data transfer between regions
  5. Review backups and retention settings

Monitor Costs

# DigitalOcean
doctl billing get

# AWS Cost Explorer
aws ce get-cost-and-usage \
  --time-period Start=2024-01-01,End=2024-01-31 \
  --granularity MONTHLY \
  --metrics BlendedCost \
  --group-by Type=DIMENSION,Key=SERVICE

# Hetzner (manual via console)
# https://console.hetzner.cloud/billing

Troubleshooting

Issue: One Region Not Responding

Diagnosis:

# Check health checks
aws route53 get-health-check-status --health-check-id abc123

# Test regional endpoints
curl -v https://us.api.example.com/health
curl -v https://eu.api.example.com/health
curl -v https://asia.api.example.com/health

Solution:

  • Check web server status in affected region
  • Verify load balancer is healthy
  • Review security groups/firewall rules
  • Check application logs on web servers

Issue: High Replication Lag

Diagnosis:

# Check replication status
psql -h us-db-primary.abc123.us-east-1.rds.amazonaws.com -U admin -d postgres \
  -c "SELECT now() - pg_last_xact_replay_timestamp() AS replication_lag;"

# Check replication slots
psql -h us-db-primary.abc123.us-east-1.rds.amazonaws.com -U admin -d postgres \
  -c "SELECT * FROM pg_replication_slots;"

Solution:

  • Check network connectivity between regions
  • Verify VPN tunnels are operational
  • Reduce write load on primary
  • Monitor network bandwidth
  • May need larger database instance

Issue: VPN Tunnel Down

Diagnosis:

# Check VPN connection status
aws ec2 describe-vpn-connections --region us-east-1

# Test connectivity between regions
ssh hetzner-server "ping 10.0.0.1"

Solution:

  • Reconnect VPN tunnel manually
  • Verify tunnel configuration
  • Check security groups allow necessary ports
  • Review ISP routing

Cleanup

To destroy all resources (use carefully):

# DigitalOcean
doctl compute droplet delete --force us-app-1 us-app-2 us-app-3
doctl compute load-balancer delete --force us-lb
doctl compute database delete --force us-db-primary

# Hetzner
hcloud server delete hetzner-eu-1 hetzner-eu-2 hetzner-eu-3
hcloud load-balancer delete eu-lb
hcloud volume delete eu-backups

# AWS
aws ec2 terminate-instances --region ap-southeast-1 --instance-ids i-xxxxx
aws elbv2 delete-load-balancer --load-balancer-arn arn:aws:elasticloadbalancing:ap-southeast-1:123456789:loadbalancer/app/asia-lb/1234567890abcdef
aws rds delete-db-instance --db-instance-identifier asia-db-replica --skip-final-snapshot

# Route53
aws route53 delete-health-check --health-check-id abc123
aws route53 delete-hosted-zone --id Z1234567890ABC

Next Steps

  1. Disaster Recovery Testing: Regular failover drills
  2. Auto-scaling: Add provider-specific autoscaling
  3. Monitoring Integration: Connect to centralized monitoring (Datadog, New Relic, Prometheus)
  4. Backup Automation: Implement cross-region backups
  5. Cost Optimization: Review and tune resource sizing
  6. Security Hardening: Implement WAF, DDoS protection
  7. Load Testing: Validate performance across regions

Support

For issues or questions:

  • Review the multi-provider networking guide
  • Check provider-specific documentation
  • Review regional deployment logs: ./deploy.nu --debug
  • Test regional endpoints independently

Files

  • workspace.ncl: Global infrastructure definition (Nickel)
  • config.toml: Provider credentials and regional settings
  • deploy.nu: Multi-region deployment orchestration (Nushell)
  • README.md: This file