730 lines
18 KiB
Markdown
Raw Normal View History

# Multi-Region High Availability Workspace
This workspace demonstrates a production-ready global high availability deployment spanning three cloud providers across three geographic regions:
- **US East (DigitalOcean NYC)**: Primary region - active serving, primary database
- **EU Central (Hetzner Germany)**: Secondary region - active serving, read replicas
- **Asia Pacific (AWS Singapore)**: Tertiary region - active serving, read replicas
## Why Multi-Region High Availability?
### Business Benefits
- **99.99% Uptime**: Automatic failover across regions
- **Low Latency**: Users served from geographically closest region
- **Compliance**: Data residency in specific regions (GDPR for EU)
- **Disaster Recovery**: Complete regional failure tolerance
### Technical Benefits
- **Load Distribution**: Traffic spread across 3 regions
- **Cost Optimization**: Pay only for actual usage (~$311/month)
- **Provider Diversity**: Reduces vendor lock-in risk
- **Capacity Planning**: Scale independently per region
## Architecture Overview
```bash
┌─────────────────────────────────────────────────────────────────┐
│ Global Route53 DNS │
│ Geographic Routing + Health Checks │
└────────────────────┬────────────────────────────────────────────┘
┌───────────┼───────────┐
│ │ │
┌────▼─────┐ ┌──▼────────┐ ┌▼──────────┐
│ US │ │ EU │ │ APAC │
│ Primary │ │ Secondary │ │ Tertiary │
└────┬─────┘ └──┬────────┘ └▼──────────┘
│ │ │
┌────▼──────────▼───────────▼────┐
│ Multi-Master Database │
│ Replication (300s lag) │
└────────────────────────────────┘
│ │ │
┌────▼────┐ ┌──▼─────┐ ┌──▼────┐
│DO Droplets Hetzner AWS
│ 3 x nyc3 3 x nbg1 3 x sgp1
│ │ │ │
│ Load Balancer (per region)
│ │ │ │
└─────────┼─────────┼─────────┘
│VPN Tunnels (IPSec)│
└───────────────────┘
```
### Regional Components
#### US East (DigitalOcean) - Primary
```bash
Region: nyc3 (New York)
Compute: 3x Droplets (s-2vcpu-4gb)
Load Balancer: Round-robin with health checks
Database: PostgreSQL (3-node cluster, Multi-AZ)
Network: VPC 10.0.0.0/16
Cost: ~$102/month
```
#### EU Central (Hetzner) - Secondary
```bash
Region: nbg1 (Nuremberg, Germany)
Compute: 3x CPX21 servers (4 vCPU, 8GB RAM)
Load Balancer: Hetzner Load Balancer
Database: Read-only replica (lag: 300s)
Network: vSwitch 10.1.0.0/16
Cost: ~$79/month (€72.70)
```
#### Asia Pacific (AWS) - Tertiary
```bash
Region: ap-southeast-1 (Singapore)
Compute: 3x EC2 t3.medium instances
Load Balancer: Application Load Balancer (ALB)
Database: RDS read-only replica (lag: 300s)
Network: VPC 10.2.0.0/16
Cost: ~$130/month
```
## Prerequisites
### 1. Cloud Accounts & Credentials
#### DigitalOcean
```bash
# Create API token
# Dashboard → API → Tokens/Keys → Generate New Token
# Scopes: read, write
export DIGITALOCEAN_TOKEN="dop_v1_abc123def456ghi789jkl012mno"
```
#### Hetzner
```bash
# Create API token
# Dashboard → Security → API Tokens → Generate Token
export HCLOUD_TOKEN="MC4wNTI1YmE1M2E4YmE0YTQzMTQyZTdlODYy"
```
#### AWS
```bash
# Create IAM user with programmatic access
# IAM → Users → Add User → Check "Programmatic access"
# Attach policies: AmazonEC2FullAccess, AmazonRDSFullAccess, Route53FullAccess
export AWS_ACCESS_KEY_ID="AKIA1234567890ABCDEF"
export AWS_SECRET_ACCESS_KEY="wJalrXUtnFEMI/K7MDENG+j/zI0m1234567890ab"
```
### 2. CLI Tools
```bash
# Verify all CLIs are installed
which doctl
which hcloud
which aws
which nickel
# Versions
doctl version # >= 1.94.0
hcloud version # >= 1.35.0
aws --version # >= 2.0
nickel --version # >= 1.0
```
### 3. SSH Keys
#### DigitalOcean
```bash
# Upload SSH key
doctl compute ssh-key create provisioning-key
--public-key-from-file ~/.ssh/id_rsa.pub
# Note the key ID
doctl compute ssh-key list
```
#### Hetzner
```bash
# Upload SSH key
hcloud ssh-key create
--name provisioning-key
--public-key-from-file ~/.ssh/id_rsa.pub
# List keys
hcloud ssh-key list
```
#### AWS
```bash
# Create or import EC2 key pair
aws ec2 create-key-pair
--key-name provisioning-key
--query 'KeyMaterial' --output text > provisioning-key.pem
chmod 600 provisioning-key.pem
```
### 4. Domain and DNS
You need a domain with Route53 or ability to create DNS records:
```bash
# Create hosted zone in Route53
aws route53 create-hosted-zone
--name api.example.com
--caller-reference $(date +%s)
# Note the Zone ID for updates
aws route53 list-hosted-zones
```
## Deployment
### Step 1: Configure the Workspace
Edit `workspace.ncl` to customize:
```nickel
# Update SSH key references
droplets = digitalocean.Droplet & {
ssh_keys = ["YOUR_DO_KEY_ID"],
name = "us-app",
region = "nyc3"
}
# Update AWS AMI IDs for your region
app_servers = aws.EC2 & {
image_id = "ami-09d56f8956ab235b7",
instance_type = "t3.medium",
region = "ap-southeast-1"
}
# Update certificate ID
load_balancer = digitalocean.LoadBalancer & {
forwarding_rules = [{
certificate_id = "your-certificate-id",
entry_protocol = "https",
entry_port = 443
}]
}
```
Edit `config.toml`:
```toml
# Update regional names if different
[providers.digitalocean]
region_name = "us-east"
[providers.hetzner]
region_name = "eu-central"
[providers.aws]
region_name = "asia-southeast"
# Update domain
[dns]
domain = "api.example.com"
```
### Step 2: Validate Configuration
```toml
# Validate Nickel syntax
nickel export workspace.ncl | jq . > /dev/null
# Verify credentials per provider
doctl auth init --access-token $DIGITALOCEAN_TOKEN
hcloud context use default
aws sts get-caller-identity
# Check connectivity
doctl account get
hcloud server list
aws ec2 describe-regions
```
### Step 3: Deploy
```bash
# Make script executable
chmod +x deploy.nu
# Execute deployment (step-by-step)
./deploy.nu
# Or with debug output
./deploy.nu --debug
# Or deploy per region
./deploy.nu --region us-east
./deploy.nu --region eu-central
./deploy.nu --region asia-southeast
```
### Step 4: Verify Global Deployment
```bash
# List resources per region
echo "=== US EAST (DigitalOcean) ==="
doctl compute droplet list --format Name,Region,Status,PublicIPv4
doctl compute load-balancer list
echo "=== EU CENTRAL (Hetzner) ==="
hcloud server list
echo "=== ASIA PACIFIC (AWS) ==="
aws ec2 describe-instances --region ap-southeast-1
--query 'Reservations[*].Instances[*].[InstanceId,InstanceType,State.Name,PublicIpAddress]'
--output table
aws elbv2 describe-load-balancers --region ap-southeast-1
```
## Post-Deployment Configuration
### 1. SSL/TLS Certificates
#### AWS Certificate Manager
```bash
# Request certificate for all regions
aws acm request-certificate
--domain-name api.example.com
--subject-alternative-names *.api.example.com
--validation-method DNS
--region us-east-1
# Get certificate ARN
aws acm list-certificates --region us-east-1
# Note the ARN for workspace.ncl
```
### 2. Database Primary/Replica Setup
```bash
# Connect to US East primary
PGPASSWORD=admin psql -h us-db-primary.abc123.us-east-1.rds.amazonaws.com -U admin -d postgres
# Create read-only replication users for EU and APAC
CREATE ROLE replication_user WITH REPLICATION LOGIN PASSWORD 'replica_password';
# On EU read replica (Hetzner) - verify replication
SELECT slot_name, restart_lsn, confirmed_flush_lsn FROM pg_replication_slots;
# On APAC read replica (AWS RDS) - verify replica status
SELECT databaseid, xmin, catalog_xmin FROM pg_replication_origin_status;
```
### 3. Global DNS Setup
```bash
# Create Route53 records for each region
aws route53 change-resource-record-sets
--hosted-zone-id Z1234567890ABC
--change-batch '{
"Changes": [
{
"Action": "CREATE",
"ResourceRecordSet": {
"Name": "us.api.example.com",
"Type": "A",
"TTL": 60,
"ResourceRecords": [{"Value": "198.51.100.15"}]
}
},
{
"Action": "CREATE",
"ResourceRecordSet": {
"Name": "eu.api.example.com",
"Type": "A",
"TTL": 60,
"ResourceRecords": [{"Value": "192.0.2.100"}]
}
},
{
"Action": "CREATE",
"ResourceRecordSet": {
"Name": "asia.api.example.com",
"Type": "A",
"TTL": 60,
"ResourceRecords": [{"Value": "203.0.113.50"}]
}
}
]
}'
# Health checks per region
aws route53 create-health-check
--health-check-config '{
"Type": "HTTPS",
"ResourcePath": "/health",
"FullyQualifiedDomainName": "us.api.example.com",
"Port": 443,
"RequestInterval": 30,
"FailureThreshold": 3
}'
```
### 4. Application Deployment
SSH to web servers in each region:
```bash
# US East
US_IP=$(doctl compute droplet get us-app-1 --format PublicIPv4 --no-header)
ssh root@$US_IP
# Deploy application
cd /var/www
git clone https://github.com/your-org/app.git
cd app
./deploy.sh
# EU Central
EU_IP=$(hcloud server list --selector region=eu-central --format ID | head -1 | xargs -I {} hcloud server ip {})
ssh root@$EU_IP
# Asia Pacific
ASIA_IP=$(aws ec2 describe-instances
--region ap-southeast-1
--filters "Name=tag:Name,Values=asia-app-1"
--query 'Reservations[0].Instances[0].PublicIpAddress'
--output text)
ssh -i provisioning-key.pem ec2-user@$ASIA_IP
```
## Monitoring and Health Checks
### Regional Monitoring
Each region generates metrics to CloudWatch/provider-specific monitoring:
```bash
# DigitalOcean metrics
doctl monitoring metrics list droplet
--droplet-id 123456789
--metric cpu
# Hetzner metrics (manual monitoring)
hcloud server list
# AWS CloudWatch
aws cloudwatch get-metric-statistics
--metric-name CPUUtilization
--namespace AWS/EC2
--start-time 2024-01-01T00:00:00Z
--end-time 2024-01-02T00:00:00Z
--period 300
--statistics Average
```
### Global Health Checks
Route53 health checks verify all regions are healthy:
```bash
# List health checks
aws route53 list-health-checks
# Get detailed status
aws route53 get-health-check-status --health-check-id abc123
# Verify replication lag
# On primary (US East) DigitalOcean
SELECT now() - pg_last_xact_replay_timestamp() AS replication_lag;
# Should be less than 300 seconds
```
### Alert Configuration
Configure alerts for critical metrics:
```toml
# CPU > 80%
aws cloudwatch put-metric-alarm
--alarm-name us-east-high-cpu
--alarm-actions arn:aws:sns:us-east-1:123456:ops-alerts
--metric-name CPUUtilization
--threshold 80
--comparison-operator GreaterThanThreshold
# Replication lag > 600s
aws cloudwatch put-metric-alarm
--alarm-name replication-lag-critical
--metric-name ReplicationLag
--threshold 600
--comparison-operator GreaterThanThreshold
```
## Failover Testing
### Planned Failover - US East to EU Central
```bash
# 1. Stop traffic to US East
aws route53 change-resource-record-sets
--hosted-zone-id Z1234567890ABC
--change-batch '{
"Changes": [{
"Action": "UPSERT",
"ResourceRecordSet": {
"Name": "api.example.com",
"Type": "A",
"TTL": 60,
"ResourceRecords": [{"Value": "192.0.2.100"}]
}
}]
}'
# 2. Promote EU Central to primary
# Connect to EU read replica and promote
psql -h hetzner-eu-db.netz.de -U admin -d postgres
-c "SELECT pg_promote();"
# 3. Verify failover
curl https://api.example.com/health
# 4. Monitor replication (now from EU)
SELECT now() - pg_last_xact_replay_timestamp() AS replication_lag;
```
### Automatic Failover - Health Check Failure
Route53 automatically fails over when health checks fail:
```bash
# Simulate US East failure (for testing only)
# Stop web servers temporarily
doctl compute droplet-action power-off us-app-1 us-app-2 us-app-3
# Wait ~1 minute for health check to fail
sleep 60
# Verify traffic now routes to EU/APAC
curl https://api.example.com/ -v | grep -E "^< Server"
# Restore US East
doctl compute droplet-action power-on us-app-1 us-app-2 us-app-3
```
## Scaling and Upgrades
### Add More Web Servers
Edit `workspace.ncl`:
```nickel
# Increase droplet count
region_us_east.app_servers = digitalocean.Droplet & {
count = 5,
name = "us-app",
region = "nyc3"
}
# Increase Hetzner servers
region_eu_central.app_servers = hetzner.Server & {
count = 5,
server_type = "cpx21",
location = "nbg1"
}
# Increase AWS EC2 instances
region_asia_southeast.app_servers = aws.EC2 & {
count = 5,
instance_type = "t3.medium",
region = "ap-southeast-1"
}
```
Redeploy:
```bash
./deploy.nu --region us-east
./deploy.nu --region eu-central
./deploy.nu --region asia-southeast
```
### Upgrade Database Instance Class
Edit `workspace.ncl`:
```nickel
# US East primary
database = digitalocean.Database & {
size = "db-s-4vcpu-8gb",
name = "us-db-primary",
engine = "pg"
}
```
DigitalOcean handles upgrade with minimal downtime.
### Upgrade EC2 Instances
```bash
# Stop instances for upgrade (rolling)
aws ec2 stop-instances --region ap-southeast-1 --instance-ids i-1234567890abcdef0
# Wait for stop
aws ec2 wait instance-stopped --region ap-southeast-1 --instance-ids i-1234567890abcdef0
# Modify instance type
aws ec2 modify-instance-attribute
--region ap-southeast-1
--instance-id i-1234567890abcdef0
--instance-type t3.large
# Start instance
aws ec2 start-instances --region ap-southeast-1 --instance-ids i-1234567890abcdef0
```
## Cost Optimization
### Monthly Cost Breakdown
| Component | US East | EU Central | Asia Pacific | Total |
| ----------- | --------- | ----------- | -------------- | ------- |
| Compute | $72 | €62.70 | $80 | $242.70 |
| Database | $30 | Read Replica | $30 | $60 |
| Load Balancer | Free | ~$10 | ~$20 | ~$30 |
| **Total** | **$102** | **~$79** | **$130** | **~$311** |
### Optimization Strategies
1. Reduce instance count from 3 to 2 (saves ~$30-40/month)
2. Downsize compute to s-1vcpu-2gb (saves ~$20-30/month)
3. Use Reserved Instances on AWS (saves ~20-30%)
4. Optimize data transfer between regions
5. Review backups and retention settings
### Monitor Costs
```bash
# DigitalOcean
doctl billing get
# AWS Cost Explorer
aws ce get-cost-and-usage
--time-period Start=2024-01-01,End=2024-01-31
--granularity MONTHLY
--metrics BlendedCost
--group-by Type=DIMENSION,Key=SERVICE
# Hetzner (manual via console)
# https://console.hetzner.cloud/billing
```
## Troubleshooting
### Issue: One Region Not Responding
**Diagnosis**:
```bash
# Check health checks
aws route53 get-health-check-status --health-check-id abc123
# Test regional endpoints
curl -v https://us.api.example.com/health
curl -v https://eu.api.example.com/health
curl -v https://asia.api.example.com/health
```
**Solution**:
- Check web server status in affected region
- Verify load balancer is healthy
- Review security groups/firewall rules
- Check application logs on web servers
### Issue: High Replication Lag
**Diagnosis**:
```bash
# Check replication status
psql -h us-db-primary.abc123.us-east-1.rds.amazonaws.com -U admin -d postgres
-c "SELECT now() - pg_last_xact_replay_timestamp() AS replication_lag;"
# Check replication slots
psql -h us-db-primary.abc123.us-east-1.rds.amazonaws.com -U admin -d postgres
-c "SELECT * FROM pg_replication_slots;"
```
**Solution**:
- Check network connectivity between regions
- Verify VPN tunnels are operational
- Reduce write load on primary
- Monitor network bandwidth
- May need larger database instance
### Issue: VPN Tunnel Down
**Diagnosis**:
```bash
# Check VPN connection status
aws ec2 describe-vpn-connections --region us-east-1
# Test connectivity between regions
ssh hetzner-server "ping 10.0.0.1"
```
**Solution**:
- Reconnect VPN tunnel manually
- Verify tunnel configuration
- Check security groups allow necessary ports
- Review ISP routing
## Cleanup
To destroy all resources (use carefully):
```bash
# DigitalOcean
doctl compute droplet delete --force us-app-1 us-app-2 us-app-3
doctl compute load-balancer delete --force us-lb
doctl compute database delete --force us-db-primary
# Hetzner
hcloud server delete hetzner-eu-1 hetzner-eu-2 hetzner-eu-3
hcloud load-balancer delete eu-lb
hcloud volume delete eu-backups
# AWS
aws ec2 terminate-instances --region ap-southeast-1 --instance-ids i-xxxxx
aws elbv2 delete-load-balancer --load-balancer-arn arn:aws:elasticloadbalancing:ap-southeast-1:123456789:loadbalancer/app/asia-lb/1234567890abcdef
aws rds delete-db-instance --db-instance-identifier asia-db-replica --skip-final-snapshot
# Route53
aws route53 delete-health-check --health-check-id abc123
aws route53 delete-hosted-zone --id Z1234567890ABC
```
## Next Steps
1. Disaster Recovery Testing: Regular failover drills
2. Auto-scaling: Add provider-specific autoscaling
3. Monitoring Integration: Connect to centralized monitoring (Datadog, New Relic, Prometheus)
4. Backup Automation: Implement cross-region backups
5. Cost Optimization: Review and tune resource sizing
6. Security Hardening: Implement WAF, DDoS protection
7. Load Testing: Validate performance across regions
## Support
For issues or questions:
- Review the multi-provider networking guide
- Check provider-specific documentation
- Review regional deployment logs: `./deploy.nu --debug`
- Test regional endpoints independently
## Files
- `workspace.ncl`: Global infrastructure definition (Nickel)
- `config.toml`: Provider credentials and regional settings
- `deploy.nu`: Multi-region deployment orchestration (Nushell)
2026-01-14 05:01:36 +00:00
- `README.md`: This file