TypeDialog/docs/deployment.md

825 lines
16 KiB
Markdown
Raw Normal View History

2025-12-24 03:11:32 +00:00
# TypeDialog Agents - Deployment Guide
Production deployment guide for TypeDialog Agents system.
## Table of Contents
1. [Deployment Options](#deployment-options)
2. [Local Development](#local-development)
3. [Systemd Service](#systemd-service-linux)
4. [Docker Deployment](#docker-deployment)
5. [Kubernetes](#kubernetes)
6. [Reverse Proxy](#reverse-proxy-nginx)
7. [Monitoring](#monitoring)
8. [Security](#security)
9. [Performance Tuning](#performance-tuning)
10. [Backup & Recovery](#backup--recovery)
---
## Deployment Options
| Method | Complexity | Scalability | Best For |
|--------|-----------|-------------|----------|
| **Systemd Service** | Low | Single server | Small deployments |
| **Docker** | Medium | Multi-container | Standard deployments |
| **Kubernetes** | High | Cluster | Large-scale production |
| **Serverless** | Medium | Auto-scale | Variable workloads |
---
## Local Development
### Development Server
```bash
# Start development server with hot reload
cargo watch -x 'run --package typedialog-ag -- serve --port 8765'
# Or use release build
cargo run --release --package typedialog-ag -- serve --port 8765
```
### Environment Setup
Create `.env` file:
```bash
# .env
ANTHROPIC_API_KEY=sk-ant-your-key-here
OPENAI_API_KEY=sk-your-openai-key
RUST_LOG=info
TYPEAGENT_CACHE_DIR=/tmp/typeagent-cache
```
Load environment:
```bash
source .env
typedialog-ag serve --port 8765
```
---
## Systemd Service (Linux)
### Create Service File
Create `/etc/systemd/system/typeagent.service`:
```ini
[Unit]
Description=TypeDialog Agent HTTP Server
After=network.target
Wants=network-online.target
[Service]
Type=simple
User=typeagent
Group=typeagent
WorkingDirectory=/opt/typeagent
ExecStart=/usr/local/bin/typedialog-ag serve --port 8765
# Environment
Environment="ANTHROPIC_API_KEY=sk-ant-your-key-here"
Environment="RUST_LOG=info"
Environment="TYPEAGENT_CACHE_DIR=/var/cache/typeagent"
# Restart policy
Restart=always
RestartSec=10
StartLimitBurst=5
StartLimitIntervalSec=60
# Resource limits
MemoryLimit=2G
CPUQuota=200%
# Security hardening
NoNewPrivileges=true
PrivateTmp=true
ProtectSystem=strict
ProtectHome=true
ReadWritePaths=/var/cache/typeagent /var/log/typeagent
# Logging
StandardOutput=journal
StandardError=journal
SyslogIdentifier=typeagent
[Install]
WantedBy=multi-user.target
```
### Setup Service
```bash
# Create user
sudo useradd -r -s /bin/false typeagent
# Create directories
sudo mkdir -p /opt/typeagent/agents
sudo mkdir -p /var/cache/typeagent
sudo mkdir -p /var/log/typeagent
# Copy binary
sudo cp target/release/typedialog-ag /usr/local/bin/
sudo chmod 755 /usr/local/bin/typedialog-ag
# Copy agents
sudo cp -r agents/* /opt/typeagent/agents/
# Set permissions
sudo chown -R typeagent:typeagent /opt/typeagent
sudo chown -R typeagent:typeagent /var/cache/typeagent
sudo chown -R typeagent:typeagent /var/log/typeagent
# Install service
sudo systemctl daemon-reload
sudo systemctl enable typeagent
sudo systemctl start typeagent
```
### Manage Service
```bash
# Start service
sudo systemctl start typeagent
# Stop service
sudo systemctl stop typeagent
# Restart service
sudo systemctl restart typeagent
# Check status
sudo systemctl status typeagent
# View logs
sudo journalctl -u typeagent -f
# View last 100 lines
sudo journalctl -u typeagent -n 100
```
### Update Deployment
```bash
# Build new version
cargo build --release --package typedialog-ag
# Stop service
sudo systemctl stop typeagent
# Update binary
sudo cp target/release/typedialog-ag /usr/local/bin/
# Start service
sudo systemctl start typeagent
# Verify
sudo systemctl status typeagent
```
---
## Docker Deployment
### Dockerfile
Create `docker/Dockerfile`:
```dockerfile
# Multi-stage build for minimal image size
FROM rust:1.75-slim-bullseye AS builder
# Install dependencies
RUN apt-get update && apt-get install -y \
pkg-config \
libssl-dev \
&& rm -rf /var/lib/apt/lists/*
# Create app directory
WORKDIR /app
# Copy manifests
COPY Cargo.toml Cargo.lock ./
COPY crates ./crates
# Build release binary
RUN cargo build --release --package typedialog-ag
# Runtime stage
FROM debian:bullseye-slim
# Install runtime dependencies
RUN apt-get update && apt-get install -y \
ca-certificates \
libssl1.1 \
&& rm -rf /var/lib/apt/lists/*
# Create non-root user
RUN useradd -r -s /bin/false typeagent
# Create directories
RUN mkdir -p /app/agents /var/cache/typeagent
RUN chown -R typeagent:typeagent /app /var/cache/typeagent
# Copy binary from builder
COPY --from=builder /app/target/release/typedialog-ag /usr/local/bin/
# Copy agents
COPY agents /app/agents
# Switch to non-root user
USER typeagent
WORKDIR /app
# Expose port
EXPOSE 8765
# Health check
HEALTHCHECK --interval=30s --timeout=3s --start-period=5s --retries=3 \
CMD curl -f http://localhost:8765/health || exit 1
# Run server
CMD ["typedialog-ag", "serve", "--port", "8765"]
```
### Build Image
```bash
cd /path/to/typedialog
# Build image
docker build -t typeagent:latest -f docker/Dockerfile .
# Build with specific tag
docker build -t typeagent:v1.0.0 -f docker/Dockerfile .
# Verify image
docker images | grep typeagent
```
### Run Container
```bash
# Run with environment variables
docker run -d \
--name typeagent \
-p 8765:8765 \
-e ANTHROPIC_API_KEY=sk-ant-your-key-here \
-e RUST_LOG=info \
--restart unless-stopped \
typeagent:latest
# Run with volume mounts
docker run -d \
--name typeagent \
-p 8765:8765 \
-e ANTHROPIC_API_KEY=sk-ant-your-key-here \
-v $(pwd)/agents:/app/agents:ro \
-v typeagent-cache:/var/cache/typeagent \
--restart unless-stopped \
typeagent:latest
# Check logs
docker logs -f typeagent
# Check health
docker ps | grep typeagent
curl http://localhost:8765/health
```
### Docker Compose
Create `docker/docker-compose.yml`:
```yaml
version: '3.8'
services:
typeagent:
image: typeagent:latest
build:
context: ..
dockerfile: docker/Dockerfile
container_name: typeagent
restart: unless-stopped
ports:
- "8765:8765"
environment:
- ANTHROPIC_API_KEY=${ANTHROPIC_API_KEY}
- OPENAI_API_KEY=${OPENAI_API_KEY:-}
- RUST_LOG=info
- TYPEAGENT_CACHE_DIR=/var/cache/typeagent
volumes:
- ../agents:/app/agents:ro
- typeagent-cache:/var/cache/typeagent
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:8765/health"]
interval: 30s
timeout: 3s
retries: 3
start_period: 5s
deploy:
resources:
limits:
cpus: '2'
memory: 2G
reservations:
cpus: '0.5'
memory: 512M
# Optional: Nginx reverse proxy
nginx:
image: nginx:alpine
container_name: typeagent-nginx
restart: unless-stopped
ports:
- "80:80"
- "443:443"
volumes:
- ./nginx.conf:/etc/nginx/nginx.conf:ro
- ./ssl:/etc/nginx/ssl:ro
depends_on:
- typeagent
volumes:
typeagent-cache:
driver: local
```
Usage:
```bash
cd docker
# Start services
docker-compose up -d
# View logs
docker-compose logs -f
# Stop services
docker-compose down
# Restart
docker-compose restart
# Update and restart
docker-compose build
docker-compose up -d
```
---
## Kubernetes
### Deployment Manifest
Create `k8s/deployment.yaml`:
```yaml
apiVersion: v1
kind: Namespace
metadata:
name: typeagent
---
apiVersion: v1
kind: Secret
metadata:
name: typeagent-secrets
namespace: typeagent
type: Opaque
stringData:
anthropic-api-key: sk-ant-your-key-here
openai-api-key: sk-your-openai-key
---
apiVersion: v1
kind: ConfigMap
metadata:
name: typeagent-config
namespace: typeagent
data:
RUST_LOG: "info"
TYPEAGENT_CACHE_DIR: "/var/cache/typeagent"
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: typeagent
namespace: typeagent
labels:
app: typeagent
spec:
replicas: 3
selector:
matchLabels:
app: typeagent
template:
metadata:
labels:
app: typeagent
spec:
containers:
- name: typeagent
image: typeagent:latest
imagePullPolicy: IfNotPresent
ports:
- containerPort: 8765
name: http
env:
- name: ANTHROPIC_API_KEY
valueFrom:
secretKeyRef:
name: typeagent-secrets
key: anthropic-api-key
- name: RUST_LOG
valueFrom:
configMapKeyRef:
name: typeagent-config
key: RUST_LOG
- name: TYPEAGENT_CACHE_DIR
valueFrom:
configMapKeyRef:
name: typeagent-config
key: TYPEAGENT_CACHE_DIR
resources:
requests:
memory: "512Mi"
cpu: "500m"
limits:
memory: "2Gi"
cpu: "2000m"
livenessProbe:
httpGet:
path: /health
port: 8765
initialDelaySeconds: 10
periodSeconds: 30
readinessProbe:
httpGet:
path: /health
port: 8765
initialDelaySeconds: 5
periodSeconds: 10
volumeMounts:
- name: cache
mountPath: /var/cache/typeagent
volumes:
- name: cache
emptyDir: {}
---
apiVersion: v1
kind: Service
metadata:
name: typeagent
namespace: typeagent
spec:
selector:
app: typeagent
ports:
- port: 80
targetPort: 8765
protocol: TCP
type: ClusterIP
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: typeagent
namespace: typeagent
annotations:
kubernetes.io/ingress.class: nginx
cert-manager.io/cluster-issuer: letsencrypt-prod
spec:
tls:
- hosts:
- typeagent.yourdomain.com
secretName: typeagent-tls
rules:
- host: typeagent.yourdomain.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: typeagent
port:
number: 80
```
Deploy:
```bash
# Apply manifests
kubectl apply -f k8s/deployment.yaml
# Check status
kubectl get all -n typeagent
# View logs
kubectl logs -n typeagent -l app=typeagent -f
# Scale replicas
kubectl scale deployment/typeagent -n typeagent --replicas=5
# Rolling update
kubectl set image deployment/typeagent typeagent=typeagent:v1.1.0 -n typeagent
```
---
## Reverse Proxy (Nginx)
### Nginx Configuration
Create `/etc/nginx/sites-available/typeagent`:
```nginx
upstream typeagent {
# Multiple backend servers for load balancing
server 127.0.0.1:8765 max_fails=3 fail_timeout=30s;
# server 127.0.0.1:8766 max_fails=3 fail_timeout=30s;
# server 127.0.0.1:8767 max_fails=3 fail_timeout=30s;
keepalive 32;
}
# Rate limiting
limit_req_zone $binary_remote_addr zone=typeagent_limit:10m rate=10r/s;
server {
listen 80;
server_name typeagent.yourdomain.com;
# Redirect to HTTPS
return 301 https://$server_name$request_uri;
}
server {
listen 443 ssl http2;
server_name typeagent.yourdomain.com;
# SSL Configuration
ssl_certificate /etc/letsencrypt/live/typeagent.yourdomain.com/fullchain.pem;
ssl_certificate_key /etc/letsencrypt/live/typeagent.yourdomain.com/privkey.pem;
ssl_protocols TLSv1.2 TLSv1.3;
ssl_ciphers HIGH:!aNULL:!MD5;
ssl_prefer_server_ciphers on;
# Security headers
add_header X-Frame-Options "SAMEORIGIN" always;
add_header X-Content-Type-Options "nosniff" always;
add_header X-XSS-Protection "1; mode=block" always;
add_header Referrer-Policy "no-referrer-when-downgrade" always;
# Logging
access_log /var/log/nginx/typeagent-access.log combined;
error_log /var/log/nginx/typeagent-error.log warn;
# Client limits
client_max_body_size 10M;
client_body_timeout 60s;
location / {
# Rate limiting
limit_req zone=typeagent_limit burst=20 nodelay;
# Proxy settings
proxy_pass http://typeagent;
proxy_http_version 1.1;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
proxy_set_header Connection "";
# Timeouts
proxy_connect_timeout 60s;
proxy_send_timeout 60s;
proxy_read_timeout 60s;
# Buffering
proxy_buffering on;
proxy_buffer_size 4k;
proxy_buffers 8 4k;
}
# Health check endpoint (no rate limiting)
location /health {
proxy_pass http://typeagent;
access_log off;
}
}
```
Enable and test:
```bash
# Test configuration
sudo nginx -t
# Enable site
sudo ln -s /etc/nginx/sites-available/typeagent /etc/nginx/sites-enabled/
# Reload nginx
sudo systemctl reload nginx
# Check status
sudo systemctl status nginx
```
---
## Monitoring
### Prometheus Metrics
Add metrics endpoint (future enhancement):
```rust
// In main.rs
Router::new()
.route("/metrics", get(metrics_handler))
```
### Logging
Structured logging with `tracing`:
```bash
# Set log level
export RUST_LOG=debug
# JSON format
export RUST_LOG_FORMAT=json
# Specific module
export RUST_LOG=typedialog_ag_core=debug,typedialog_ag=info
```
### Health Checks
```bash
# Simple health check
curl http://localhost:8765/health
# Detailed health check script
cat > health-check.sh << 'EOF'
#!/bin/bash
RESPONSE=$(curl -s -o /dev/null -w "%{http_code}" http://localhost:8765/health)
if [ "$RESPONSE" = "200" ]; then
echo "OK"
exit 0
else
echo "FAIL: HTTP $RESPONSE"
exit 1
fi
EOF
chmod +x health-check.sh
```
---
## Security
### API Key Management
**DO NOT:**
- Commit API keys to Git
- Hardcode keys in configuration files
- Share keys in plain text
**DO:**
- Use environment variables
- Use secrets management (Vault, AWS Secrets Manager)
- Rotate keys regularly
- Use separate keys per environment
### Network Security
```bash
# Firewall rules (UFW)
sudo ufw allow 8765/tcp
sudo ufw allow 80/tcp
sudo ufw allow 443/tcp
sudo ufw enable
# Or iptables
sudo iptables -A INPUT -p tcp --dport 8765 -j ACCEPT
```
### HTTPS/TLS
Use Let's Encrypt for free SSL certificates:
```bash
# Install certbot
sudo apt-get install certbot python3-certbot-nginx
# Get certificate
sudo certbot --nginx -d typeagent.yourdomain.com
# Auto-renewal
sudo certbot renew --dry-run
```
---
## Performance Tuning
### System Limits
Edit `/etc/security/limits.conf`:
```
typeagent soft nofile 65536
typeagent hard nofile 65536
typeagent soft nproc 4096
typeagent hard nproc 4096
```
### Cache Tuning
```yaml
# ~/.typeagent/config.yaml
cache:
strategy: Both
max_entries: 5000 # Increase for high traffic
cache_dir: /var/cache/typeagent
```
### Rust Performance
Build with optimizations:
```toml
# Cargo.toml
[profile.release]
lto = true
codegen-units = 1
opt-level = 3
```
---
## Backup & Recovery
### Backup Strategy
```bash
#!/bin/bash
# backup-typeagent.sh
DATE=$(date +%Y%m%d_%H%M%S)
BACKUP_DIR="/backup/typeagent/$DATE"
mkdir -p "$BACKUP_DIR"
# Backup agents
cp -r /opt/typeagent/agents "$BACKUP_DIR/"
# Backup configuration
cp ~/.typeagent/config.yaml "$BACKUP_DIR/"
# Backup cache (optional)
# tar -czf "$BACKUP_DIR/cache.tar.gz" /var/cache/typeagent
# Compress
tar -czf "/backup/typeagent-$DATE.tar.gz" "$BACKUP_DIR"
# Clean old backups (keep 30 days)
find /backup -name "typeagent-*.tar.gz" -mtime +30 -delete
```
### Recovery
```bash
# Extract backup
tar -xzf typeagent-20241223.tar.gz
# Restore agents
sudo cp -r backup/agents/* /opt/typeagent/agents/
# Restore config
cp backup/config.yaml ~/.typeagent/
# Restart service
sudo systemctl restart typeagent
```
---
## Next Steps
1. Choose deployment method
2. Configure monitoring
3. Set up backups
4. Configure SSL/TLS
5. Test failover procedures
6. Document your specific deployment
For issues, see [installation.md](installation.md) troubleshooting section.