prvng_extensions/taskservs/networking/proxy/README.md

# Proxy Task Service (HAProxy)

## Overview

The Proxy task service provides a complete installation and configuration of [HAProxy](https://www.haproxy.org/), a free, very fast and reliable solution offering high availability, load balancing, and proxying for TCP and HTTP-based applications. HAProxy is particularly suited for very high traffic web sites and is the de-facto standard open-source load balancer.

## Features

### Core Proxy Features
- **Layer 4 & Layer 7 Load Balancing** - TCP and HTTP/HTTPS traffic distribution
- **High Availability** - Active/passive and active/active configurations
- **SSL/TLS Termination** - SSL offloading and end-to-end encryption
- **Content-Based Routing** - Route based on URLs, headers, and other criteria
- **Session Persistence** - Sticky sessions and session affinity

### Load Balancing Algorithms
- **Round Robin** - Distribute requests evenly across servers
- **Least Connections** - Route to server with fewest active connections
- **Weighted Round Robin** - Assign different weights to servers
- **Source IP Hash** - Route based on client IP hash
- **URL Hash** - Route based on URL hash for cache optimization

### Health Checking & Monitoring
- **Health Checks** - TCP, HTTP, and custom health checks
- **Server Status Monitoring** - Real-time server status and metrics
- **Statistics Interface** - Built-in web statistics interface
- **Prometheus Metrics** - Native Prometheus metrics export
- **Logging** - Comprehensive access and error logging

### Security Features
- **DDoS Protection** - Rate limiting and connection limits
- **Access Control** - IP-based access control lists
- **SSL Security** - Modern TLS configuration and cipher suites
- **Request Filtering** - Block malicious requests and patterns
- **Security Headers** - Automatic security header injection

### Advanced Features
- **Compression** - HTTP response compression
- **Caching** - Basic HTTP caching capabilities
- **Request Modification** - Header manipulation and URL rewriting
- **Multi-Process Mode** - Multi-process for high concurrency
- **Configuration Validation** - Real-time configuration validation

## Configuration

### Basic HTTP Load Balancer
```kcl
proxy: Proxy = {
    proxy_version: "2.8"
    proxy_lib: "/var/lib/haproxy"
    proxy_cfg_file: "haproxy.cfg"
    run_user: "haproxy"
    run_group: "haproxy"
    run_user_home: "/home/haproxy"
    https_in_binds: [
        {
            ip: "0.0.0.0"
            port: 80
        },
        {
            ip: "0.0.0.0"
            port: 443
        }
    ]
    https_options: ["tcplog", "dontlognull", "httplog"]
    https_log_format: "%H %ci:%cp [%t] %ft %b/%s %Tw/%Tc/%Tt %B %ts %ac/%fc/%bc/%sc/%rc %sq/%bq"
    backends: [
        {
            name: "web_backend"
            ssl_sni: "example.com"
            mode: "http"
            balance: "roundrobin"
            option: "httpchk GET /health"
            server_name: "web1"
            server_host_ip: "10.0.1.10"
            server_port: 8080
            server_ops: "check fall 3 rise 2"
        }
    ]
}
```

### Production HTTPS Load Balancer
```kcl
proxy: Proxy = {
    proxy_version: "2.8"
    proxy_lib: "/var/lib/haproxy"
    proxy_cfg_file: "haproxy.cfg"
    run_user: "haproxy"
    run_group: "haproxy"
    run_user_home: "/var/lib/haproxy"
    https_in_binds: [
        {
            ip: "0.0.0.0"
            port: 80
        },
        {
            ip: "0.0.0.0"
            port: 443
        }
    ]
    https_options: ["tcplog", "dontlognull", "httplog", "log-health-checks"]
    https_log_format: "%H %ci:%cp [%t] %ft %b/%s %Tw/%Tc/%Tt %B %ts %ac/%fc/%bc/%sc/%rc %sq/%bq %hr %hs %{+Q}r"
    ssl: {
        enabled: true
        certificate_path: "/etc/ssl/haproxy"
        certificate_file: "haproxy.pem"
        protocols: "TLSv1.2 TLSv1.3"
        ciphers: "ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA-CHACHA20-POLY1305"
        redirect_http_to_https: true
        hsts_enabled: true
        hsts_max_age: 31536000
    }
    backends: [
        {
            name: "web_backend"
            ssl_sni: "api.company.com"
            mode: "http"
            balance: "leastconn"
            option: "httpchk GET /api/health HTTP/1.1\\r\\nHost:\\ api.company.com"
            server_name: "api1"
            server_host_ip: "10.0.1.10"
            server_port: 8080
            server_ops: "check fall 3 rise 2 weight 100"
        },
        {
            name: "web_backend"
            ssl_sni: "api.company.com"
            mode: "http"
            balance: "leastconn"
            option: "httpchk GET /api/health HTTP/1.1\\r\\nHost:\\ api.company.com"
            server_name: "api2"
            server_host_ip: "10.0.1.11"
            server_port: 8080
            server_ops: "check fall 3 rise 2 weight 100"
        },
        {
            name: "web_backend"
            ssl_sni: "api.company.com"
            mode: "http"
            balance: "leastconn"
            option: "httpchk GET /api/health HTTP/1.1\\r\\nHost:\\ api.company.com"
            server_name: "api3"
            server_host_ip: "10.0.1.12"
            server_port: 8080
            server_ops: "check fall 3 rise 2 weight 50 backup"
        }
    ]
    performance: {
        maxconn: 4096
        nbproc: 4
        cpu_map: "auto"
        tune_ssl_default_dh_param: 2048
        tune_bufsize: 32768
        tune_maxrewrite: 8192
    }
}
```

### Multi-Service Load Balancer
```kcl
proxy: Proxy = {
    proxy_version: "2.8"
    # ... base configuration
    https_in_binds: [
        {
            ip: "0.0.0.0"
            port: 80
        },
        {
            ip: "0.0.0.0"
            port: 443
        }
    ]
    backends: [
        {
            name: "api_backend"
            ssl_sni: "api.company.com"
            mode: "http"
            balance: "roundrobin"
            option: "httpchk GET /health"
            server_name: "api1"
            server_host_ip: "10.0.1.10"
            server_port: 3000
            server_ops: "check fall 3 rise 2"
        },
        {
            name: "api_backend"
            ssl_sni: "api.company.com"
            mode: "http"
            balance: "roundrobin"
            option: "httpchk GET /health"
            server_name: "api2"
            server_host_ip: "10.0.1.11"
            server_port: 3000
            server_ops: "check fall 3 rise 2"
        },
        {
            name: "web_backend"
            ssl_sni: "www.company.com"
            mode: "http"
            balance: "source"
            option: "httpchk GET /"
            server_name: "web1"
            server_host_ip: "10.0.2.10"
            server_port: 80
            server_ops: "check fall 3 rise 2"
        },
        {
            name: "web_backend"
            ssl_sni: "www.company.com"
            mode: "http"
            balance: "source"
            option: "httpchk GET /"
            server_name: "web2"
            server_host_ip: "10.0.2.11"
            server_port: 80
            server_ops: "check fall 3 rise 2"
        }
    ]
    routing_rules: [
        {
            condition: "hdr(host) -i api.company.com"
            backend: "api_backend"
        },
        {
            condition: "hdr(host) -i www.company.com"
            backend: "web_backend"
        },
        {
            condition: "path_beg /api/"
            backend: "api_backend"
        }
    ]
}
```

### TCP Load Balancer for Databases
```kcl
proxy: Proxy = {
    proxy_version: "2.8"
    # ... base configuration
    https_in_binds: [
        {
            ip: "0.0.0.0"
            port: 5432
        },
        {
            ip: "0.0.0.0"
            port: 3306
        }
    ]
    https_options: ["tcplog", "dontlognull"]
    backends: [
        {
            name: "postgres_backend"
            ssl_sni: "postgres.company.com"
            mode: "tcp"
            balance: "leastconn"
            option: "tcp-check"
            server_name: "postgres1"
            server_host_ip: "10.0.3.10"
            server_port: 5432
            server_ops: "check fall 3 rise 2"
        },
        {
            name: "postgres_backend"
            ssl_sni: "postgres.company.com"
            mode: "tcp"
            balance: "leastconn"
            option: "tcp-check"
            server_name: "postgres2"
            server_host_ip: "10.0.3.11"
            server_port: 5432
            server_ops: "check fall 3 rise 2 backup"
        },
        {
            name: "mysql_backend"
            ssl_sni: "mysql.company.com"
            mode: "tcp"
            balance: "source"
            option: "mysql-check user haproxy"
            server_name: "mysql1"
            server_host_ip: "10.0.4.10"
            server_port: 3306
            server_ops: "check fall 3 rise 2"
        }
    ]
    tcp_services: [
        {
            bind_port: 5432
            backend: "postgres_backend"
        },
        {
            bind_port: 3306
            backend: "mysql_backend"
        }
    ]
}
```

### High-Availability Configuration
```kcl
proxy: Proxy = {
    proxy_version: "2.8"
    # ... base configuration
    ha_config: {
        keepalived: {
            enabled: true
            virtual_ip: "10.0.0.100"
            interface: "eth0"
            priority: 100  # Master: 100, Backup: 90
            advert_int: 1
            auth_pass: "haproxy_vip_password"
        }
        peers: [
            {
                name: "haproxy1"
                ip: "10.0.0.10"
            },
            {
                name: "haproxy2"
                ip: "10.0.0.11"
            }
        ]
        stick_tables: true
        session_synchronization: true
    }
    monitoring: {
        stats: {
            enabled: true
            bind_ip: "127.0.0.1"
            bind_port: 8404
            uri: "/stats"
            username: "admin"
            password: "admin123"
            refresh: 30
        }
        prometheus: {
            enabled: true
            bind_ip: "127.0.0.1"
            bind_port: 8405
            uri: "/metrics"
        }
        health_checks: {
            enabled: true
            log_health_checks: true
            email_alerts: "admin@company.com"
        }
    }
}
```

## Usage

### Deploy HAProxy
```bash
./core/nulib/provisioning taskserv create proxy --infra <infrastructure-name>
```

### List Available Task Services
```bash
./core/nulib/provisioning taskserv list
```

### SSH to Proxy Server
```bash
./core/nulib/provisioning server ssh <proxy-server>
```

### Service Management
```bash
# Check HAProxy status
systemctl status haproxy

# Start/stop HAProxy
systemctl start haproxy
systemctl stop haproxy
systemctl restart haproxy

# Reload configuration without downtime
systemctl reload haproxy

# Check HAProxy version
haproxy -v
```

### Configuration Management
```bash
# Test configuration syntax
haproxy -c -f /etc/haproxy/haproxy.cfg

# Check configuration with detailed output
haproxy -c -V -f /etc/haproxy/haproxy.cfg

# Reload configuration gracefully
sudo haproxy -f /etc/haproxy/haproxy.cfg -p /var/run/haproxy.pid -sf $(cat /var/run/haproxy.pid)

# View current configuration
cat /etc/haproxy/haproxy.cfg
```

### Statistics and Monitoring
```bash
# Access statistics via command line
echo "show info; show stat" | socat stdio /var/lib/haproxy/stats

# View current sessions
echo "show sess" | socat stdio /var/lib/haproxy/stats

# Show backend servers status
echo "show servers state" | socat stdio /var/lib/haproxy/stats

# Disable/enable server
echo "disable server backend/server1" | socat stdio /var/lib/haproxy/stats
echo "enable server backend/server1" | socat stdio /var/lib/haproxy/stats
```

### SSL Certificate Management
```bash
# Create combined certificate file
cat /etc/ssl/certs/company.crt /etc/ssl/private/company.key > /etc/ssl/haproxy/haproxy.pem

# Set proper permissions
chmod 600 /etc/ssl/haproxy/haproxy.pem
chown haproxy:haproxy /etc/ssl/haproxy/haproxy.pem

# Test SSL configuration
openssl s_client -connect localhost:443 -servername company.com

# Check certificate expiration
openssl x509 -in /etc/ssl/haproxy/haproxy.pem -noout -dates
```

### Performance Tuning
```bash
# Check current connections
echo "show info" | socat stdio /var/lib/haproxy/stats | grep -E "(CurrConns|MaxConns)"

# Monitor connection rates
echo "show info" | socat stdio /var/lib/haproxy/stats | grep -E "(ConnRate|SessRate)"

# Check memory usage
ps aux | grep haproxy
cat /proc/$(pgrep haproxy)/status | grep -E "(VmSize|VmRSS)"

# Monitor network I/O
iftop -i eth0 -f "port 80 or port 443"
```

### Log Analysis
```bash
# View real-time access logs
tail -f /var/log/haproxy/access.log

# Analyze response times
awk '{print $10}' /var/log/haproxy/access.log | sort -n | tail -10

# Count status codes
awk '{print $11}' /var/log/haproxy/access.log | sort | uniq -c

# Top client IPs
awk '{print $6}' /var/log/haproxy/access.log | cut -d: -f1 | sort | uniq -c | sort -nr | head -10
```

## Architecture

### System Architecture
```
┌─────────────────┐    ┌──────────────────┐    ┌─────────────────┐
│   Clients       │────│   HAProxy        │────│   Backend       │
│                 │    │                  │    │   Servers       │
│ • Web Browsers  │    │ • Load Balancer  │    │                 │
│ • Mobile Apps   │────│ • SSL Termination│────│ • Web Servers   │
│ • API Clients   │    │ • Health Checks  │    │ • App Servers   │
│ • Load Testing  │    │ • Rate Limiting  │    │ • Databases     │
└─────────────────┘    └──────────────────┘    └─────────────────┘
```

### High-Availability Architecture
```
┌─────────────────┐    ┌──────────────────┐    ┌─────────────────┐
│   Virtual IP    │    │   HAProxy        │    │   Backend       │
│   (Keepalived)  │    │   Cluster        │    │   Pool          │
│                 │    │                  │    │                 │
│ • 10.0.0.100    │────│ • Master (Active)│────│ • Server 1      │
│ • Failover      │    │ • Backup (Standby│    │ • Server 2      │
│ • Health Check  │    │ • Sync Sessions  │    │ • Server 3      │
└─────────────────┘    └──────────────────┘    └─────────────────┘
```

### Request Flow Architecture
```
Client Request → Frontend → ACL Rules → Backend Selection → Health Check → Server Selection → Response
     ↓              ↓           ↓             ↓               ↓              ↓           ↓
SSL Termination → Routing → Load Balancing → Failover → Server Response → SSL → Client
     ↓              ↓           ↓             ↓               ↓              ↓           ↓
Certificate    → Headers → Session Persistence → Backup Server → Compression → Headers → Browser
```

### File Structure
```
/etc/haproxy/               # Configuration directory
├── haproxy.cfg            # Main configuration file
├── errors/               # Custom error pages
│   ├── 400.http         # Bad request error page
│   ├── 403.http         # Forbidden error page
│   ├── 408.http         # Request timeout
│   ├── 500.http         # Internal server error
│   ├── 502.http         # Bad gateway
│   ├── 503.http         # Service unavailable
│   └── 504.http         # Gateway timeout
└── certs/               # SSL certificates
    └── haproxy.pem     # Combined certificate file

/var/lib/haproxy/          # Runtime directory
├── stats                 # Statistics socket
└── info                 # Runtime information

/var/log/haproxy/         # Log directory
├── access.log           # Access logs
├── error.log            # Error logs
└── haproxy.log          # Combined logs

/run/haproxy/             # Process runtime
└── haproxy.pid          # Process ID file
```

## Supported Operating Systems

- Ubuntu 20.04+ / Debian 11+
- CentOS 8+ / RHEL 8+ / Fedora 35+
- Amazon Linux 2+
- SUSE Linux Enterprise 15+

## System Requirements

### Minimum Requirements
- **RAM**: 1GB (2GB+ recommended)
- **Storage**: 10GB (20GB+ for logs)
- **CPU**: 2 cores (4+ cores recommended)
- **Network**: 100Mbps (1Gbps+ for high load)

### Production Requirements
- **RAM**: 4GB+ (8GB+ for high concurrency)
- **Storage**: 50GB+ SSD
- **CPU**: 4+ cores (16+ cores for very high load)
- **Network**: 1Gbps+ with low latency

### Performance Requirements
- **Network Bandwidth**: Adequate for peak traffic
- **CPU Performance**: High single-thread performance
- **Memory**: Sufficient for connection state and SSL
- **Disk I/O**: Fast storage for logging

## Troubleshooting

### Service Issues
```bash
# Check HAProxy status
systemctl status haproxy

# Test configuration
haproxy -c -f /etc/haproxy/haproxy.cfg

# View error logs
tail -f /var/log/haproxy/error.log

# Check process information
ps aux | grep haproxy
```

### Connection Issues
```bash
# Check listening ports
netstat -tlnp | grep haproxy
ss -tlnp | grep haproxy

# Test frontend connectivity
curl -I http://localhost/
telnet localhost 80

# Check backend connectivity
curl -I http://backend-server:8080/health

# Monitor active connections
echo "show info" | socat stdio /var/lib/haproxy/stats
```

### SSL Issues
```bash
# Test SSL connectivity
openssl s_client -connect localhost:443

# Check certificate validity
openssl x509 -in /etc/ssl/haproxy/haproxy.pem -noout -text

# Verify certificate chain
openssl verify -CApath /etc/ssl/certs /etc/ssl/haproxy/haproxy.pem

# Check SSL logs
grep -i ssl /var/log/haproxy/error.log
```

### Performance Issues
```bash
# Check HAProxy statistics
echo "show info; show stat" | socat stdio /var/lib/haproxy/stats

# Monitor system resources
htop
iostat -x 1
iftop -i eth0

# Check connection limits
ulimit -n
cat /proc/sys/net/core/somaxconn

# Analyze access patterns
tail -f /var/log/haproxy/access.log | awk '{print $6, $11, $10}'
```

### Backend Health Issues
```bash
# Check backend server status
echo "show servers state" | socat stdio /var/lib/haproxy/stats

# Test backend health checks
curl -I http://backend-server:8080/health

# Enable/disable servers
echo "enable server backend/server1" | socat stdio /var/lib/haproxy/stats
echo "disable server backend/server1" | socat stdio /var/lib/haproxy/stats

# Check health check logs
grep "Health check" /var/log/haproxy/error.log
```

## Security Considerations

### SSL/TLS Security
- **Strong Ciphers** - Use modern, secure cipher suites
- **Protocol Versions** - Disable older TLS versions
- **Certificate Management** - Regular certificate renewal
- **Perfect Forward Secrecy** - Enable PFS for all connections

### Access Control
- **IP Whitelisting** - Restrict admin access by IP
- **Rate Limiting** - Implement request rate limiting
- **DDoS Protection** - Configure connection and rate limits
- **Firewall Rules** - Limit access to necessary ports

### Configuration Security
- **Secure Headers** - Add security headers to responses
- **Error Page Security** - Don't expose internal information
- **Log Security** - Secure log files and prevent log injection
- **Process Security** - Run with minimum required privileges

### Network Security
- **Network Segmentation** - Isolate proxy and backend networks
- **Monitoring** - Monitor for suspicious traffic patterns
- **Regular Updates** - Keep HAProxy updated to latest version
- **Security Audits** - Regular security configuration reviews

## Performance Optimization

### Hardware Optimization
- **CPU** - High single-thread performance for SSL termination
- **Memory** - Adequate RAM for connection state and buffers
- **Network** - High-bandwidth, low-latency network interfaces
- **Storage** - Fast storage for logging and certificates

### Configuration Optimization
- **Connection Limits** - Optimize maxconn and server limits
- **Buffer Sizes** - Tune buffer sizes for your workload
- **SSL Optimization** - Optimize SSL session caching
- **Health Check Intervals** - Balance responsiveness and overhead

### System Optimization
- **Kernel Parameters** - Tune TCP/IP stack parameters
- **File Descriptors** - Increase ulimit for connections
- **CPU Affinity** - Bind processes to specific CPU cores
- **Memory Management** - Optimize memory allocation

### Load Balancing Optimization
- **Algorithm Selection** - Choose optimal load balancing algorithm
- **Health Checks** - Efficient health check configuration
- **Session Persistence** - Optimize sticky session handling
- **Backend Weights** - Balance load based on server capacity

## Integration Examples

### Nginx Integration (Frontend Proxy)
```nginx
upstream haproxy_backend {
    server 10.0.1.10:80;
    server 10.0.1.11:80;
}

server {
    listen 80;
    server_name company.com;

    location / {
        proxy_pass http://haproxy_backend;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
    }
}
```

### Keepalived Configuration
```bash
# /etc/keepalived/keepalived.conf
vrrp_script chk_haproxy {
    script "/bin/kill -0 `cat /var/run/haproxy.pid`"
    interval 2
    weight 2
    fall 3
    rise 2
}

vrrp_instance VI_1 {
    state MASTER
    interface eth0
    virtual_router_id 51
    priority 101
    advert_int 1
    authentication {
        auth_type PASS
        auth_pass haproxy_pass
    }
    virtual_ipaddress {
        10.0.0.100
    }
    track_script {
        chk_haproxy
    }
}
```

### Prometheus Monitoring
```yaml
# prometheus.yml
scrape_configs:
  - job_name: 'haproxy'
    static_configs:
      - targets: ['localhost:8405']
    scrape_interval: 30s
    metrics_path: '/metrics'
```

## Resources

- **Official Documentation**: [docs.haproxy.org](https://docs.haproxy.org/)
- **HAProxy Community**: [discourse.haproxy.org](https://discourse.haproxy.org/)
- **Configuration Generator**: [haproxytech.github.io/haproxy-dconv](https://haproxytech.github.io/haproxy-dconv/)
- **Best Practices**: [haproxy.com/blog](https://www.haproxy.com/blog/)
- **GitHub Repository**: [haproxy/haproxy](https://github.com/haproxy/haproxy)