prvng_platform/docs/deployment/DEPLOYMENT_GUIDE.md

725 lines
14 KiB
Markdown
Raw Permalink Normal View History

2025-10-07 10:59:52 +01:00
# Provisioning Platform Deployment Guide
**Version**: 3.0.0
**Date**: 2025-10-06
**Deployment Modes**: Solo, Multi-User, CI/CD, Enterprise
---
## Table of Contents
1. [Overview](#overview)
2. [Prerequisites](#prerequisites)
3. [Deployment Modes](#deployment-modes)
4. [Quick Start](#quick-start)
5. [Configuration](#configuration)
6. [Deployment Methods](#deployment-methods)
7. [Post-Deployment](#post-deployment)
8. [Troubleshooting](#troubleshooting)
---
## Overview
The Provisioning Platform is a comprehensive infrastructure automation system that can be deployed in four modes:
- **Solo**: Single-user local development (minimal services)
- **Multi-User**: Team collaboration with source control
- **CI/CD**: Automated deployment pipelines
- **Enterprise**: Full production with monitoring, KMS, and audit logging
### Architecture Components
| Component | Solo | Multi-User | CI/CD | Enterprise |
|-----------|------|------------|-------|------------|
| Orchestrator | ✓ | ✓ | ✓ | ✓ |
| Control Center | ✓ | ✓ | ✓ | ✓ |
| CoreDNS | ✓ | ✓ | ✓ | ✓ |
| OCI Registry (Zot) | ✓ | ✓ | ✓ | - |
| Extension Registry | ✓ | ✓ | ✓ | ✓ |
| Gitea | - | ✓ | ✓ | ✓ |
| PostgreSQL | - | ✓ | ✓ | ✓ |
| API Server | - | - | ✓ | ✓ |
| Harbor | - | - | - | ✓ |
| Cosmian KMS | - | - | - | ✓ |
| Prometheus | - | - | - | ✓ |
| Grafana | - | - | - | ✓ |
| Loki + Promtail | - | - | - | ✓ |
| Elasticsearch + Kibana | - | - | - | ✓ |
| Nginx Reverse Proxy | - | - | - | ✓ |
---
## Prerequisites
### Required Software
1. **Docker** (version 20.10+)
```bash
docker --version
# Docker version 20.10.0 or higher
```
2. **Docker Compose** (version 2.0+)
```bash
docker-compose --version
# Docker Compose version 2.0.0 or higher
```
3. **Nushell** (version 0.107.1+ for automation scripts)
```bash
nu --version
# 0.107.1 or higher
```
### System Requirements
#### Solo Mode
- **CPU**: 2 cores
- **Memory**: 4GB RAM
- **Disk**: 20GB free space
- **Network**: Internet connection for pulling images
#### Multi-User Mode
- **CPU**: 4 cores
- **Memory**: 8GB RAM
- **Disk**: 50GB free space
- **Network**: Internet connection + internal network
#### CI/CD Mode
- **CPU**: 8 cores
- **Memory**: 16GB RAM
- **Disk**: 100GB free space
- **Network**: Internet + dedicated CI/CD network
#### Enterprise Mode
- **CPU**: 16 cores
- **Memory**: 32GB RAM
- **Disk**: 500GB free space (SSD recommended)
- **Network**: High-bandwidth, low-latency network
### Optional Tools
- **OpenSSL** (for generating secrets)
- **kubectl** (for Kubernetes deployment)
- **Helm** (for Kubernetes package management)
---
## Deployment Modes
### Solo Mode
**Use Case**: Local development, testing, personal use
**Features**:
- Minimal resource usage
- No authentication required
- SQLite databases
- Local file storage
**Limitations**:
- Single user only
- No version control integration
- No audit logging
### Multi-User Mode
**Use Case**: Small team collaboration
**Features**:
- Multi-user authentication
- Gitea for source control
- PostgreSQL shared database
- User management
**Limitations**:
- No automated pipelines
- No advanced monitoring
### CI/CD Mode
**Use Case**: Automated deployment pipelines
**Features**:
- All Multi-User features
- Provisioning API Server
- Webhook support
- Jenkins/GitLab Runner integration
**Limitations**:
- Basic monitoring only
### Enterprise Mode
**Use Case**: Production deployments, compliance requirements
**Features**:
- All CI/CD features
- Harbor registry (enterprise OCI)
- Cosmian KMS (secret management)
- Full monitoring stack (Prometheus, Grafana)
- Log aggregation (Loki, Elasticsearch)
- Audit logging
- TLS/SSL encryption
- Nginx reverse proxy
---
## Quick Start
### 1. Clone Repository
```bash
cd /opt
git clone https://github.com/your-org/project-provisioning.git
cd project-provisioning/provisioning/platform
```
### 2. Generate Secrets
```bash
# Generate .env file with random secrets
./scripts/generate-secrets.nu
# Or copy and edit manually
cp .env.example .env
nano .env
```
### 3. Choose Deployment Mode and Deploy
#### Solo Mode
```bash
./scripts/deploy-platform.nu --mode solo
```
#### Multi-User Mode
```bash
# Generate secrets first
./scripts/generate-secrets.nu
# Deploy
./scripts/deploy-platform.nu --mode multi-user
```
#### CI/CD Mode
```bash
./scripts/deploy-platform.nu --mode cicd --build
```
#### Enterprise Mode
```bash
# Full production deployment
./scripts/deploy-platform.nu --mode enterprise --build --wait 600
```
### 4. Verify Deployment
```bash
# Check all services
./scripts/health-check.nu
# View logs
docker-compose logs -f
```
### 5. Access Services
- **Orchestrator**: http://localhost:8080
- **Control Center**: http://localhost:8081
- **OCI Registry**: http://localhost:5000
- **Gitea** (Multi-User+): http://localhost:3000
- **Grafana** (Enterprise): http://localhost:3001
---
## Configuration
### Environment Variables
The `.env` file controls all deployment settings. Key variables:
#### Platform Configuration
```bash
PROVISIONING_MODE=solo # solo, multi-user, cicd, enterprise
PLATFORM_ENVIRONMENT=development # development, staging, production
```
#### Service Ports
```bash
ORCHESTRATOR_PORT=8080
CONTROL_CENTER_PORT=8081
GITEA_HTTP_PORT=3000
OCI_REGISTRY_PORT=5000
```
#### Security Settings
```bash
# Generate with: openssl rand -base64 32
CONTROL_CENTER_JWT_SECRET=<random-secret>
API_SERVER_JWT_SECRET=<random-secret>
POSTGRES_PASSWORD=<random-password>
```
#### Resource Limits
```bash
ORCHESTRATOR_CPU_LIMIT=2000m
ORCHESTRATOR_MEMORY_LIMIT=2048M
```
### Configuration Files
#### Docker Compose
- **Main**: `docker-compose.yaml` (base services)
- **Solo**: `docker-compose/docker-compose.solo.yaml`
- **Multi-User**: `docker-compose/docker-compose.multi-user.yaml`
- **CI/CD**: `docker-compose/docker-compose.cicd.yaml`
- **Enterprise**: `docker-compose/docker-compose.enterprise.yaml`
#### Service Configurations
- **Orchestrator**: `orchestrator/config.defaults.toml`
- **Control Center**: `control-center/config.defaults.toml`
- **CoreDNS**: `coredns/Corefile`
- **OCI Registry**: `oci-registry/config.json`
- **Nginx**: `nginx/nginx.conf`
- **Prometheus**: `monitoring/prometheus/prometheus.yml`
---
## Deployment Methods
### Method 1: Docker Compose (Recommended)
#### Deploy
```bash
# Solo mode
docker-compose -f docker-compose.yaml \
-f docker-compose/docker-compose.solo.yaml \
up -d
# Multi-user mode
docker-compose -f docker-compose.yaml \
-f docker-compose/docker-compose.multi-user.yaml \
up -d
# CI/CD mode
docker-compose -f docker-compose.yaml \
-f docker-compose/docker-compose.multi-user.yaml \
-f docker-compose/docker-compose.cicd.yaml \
up -d
# Enterprise mode
docker-compose -f docker-compose.yaml \
-f docker-compose/docker-compose.multi-user.yaml \
-f docker-compose/docker-compose.cicd.yaml \
-f docker-compose/docker-compose.enterprise.yaml \
up -d
```
#### Manage Services
```bash
# View logs
docker-compose logs -f [service-name]
# Restart service
docker-compose restart orchestrator
# Stop all services
docker-compose down
# Stop and remove volumes (WARNING: data loss)
docker-compose down --volumes
```
### Method 2: Systemd (Linux Production)
#### Install Services
```bash
cd systemd
sudo ./install-services.sh
```
#### Manage via systemd
```bash
# Start platform
sudo systemctl start provisioning-platform
# Enable auto-start on boot
sudo systemctl enable provisioning-platform
# Check status
sudo systemctl status provisioning-platform
# View logs
sudo journalctl -u provisioning-platform -f
# Restart
sudo systemctl restart provisioning-platform
# Stop
sudo systemctl stop provisioning-platform
```
### Method 3: Kubernetes
See [KUBERNETES_DEPLOYMENT.md](./KUBERNETES_DEPLOYMENT.md) for detailed instructions.
#### Quick Deploy
```bash
# Create namespace
kubectl apply -f k8s/base/namespace.yaml
# Deploy services
kubectl apply -f k8s/deployments/
kubectl apply -f k8s/services/
kubectl apply -f k8s/ingress/
# Check status
kubectl get pods -n provisioning
```
### Method 4: Automation Script (Nushell)
```bash
# Deploy with options
./scripts/deploy-platform.nu --mode enterprise \
--build \
--wait 300
# Health check
./scripts/health-check.nu
# Dry run (show what would be deployed)
./scripts/deploy-platform.nu --mode enterprise --dry-run
```
---
## Post-Deployment
### 1. Verify Services
```bash
# Quick health check
./scripts/health-check.nu
# Detailed Docker status
docker-compose ps
# Check individual service
curl http://localhost:8080/health
```
### 2. Initial Configuration
#### Create Admin User (Multi-User+)
Access Gitea at http://localhost:3000 and complete setup wizard.
#### Configure DNS (Optional)
Add to `/etc/hosts` or configure local DNS:
```
127.0.0.1 provisioning.local
127.0.0.1 gitea.provisioning.local
127.0.0.1 grafana.provisioning.local
```
#### Configure Monitoring (Enterprise)
1. Access Grafana: http://localhost:3001
2. Login with credentials from `.env`:
- Username: `admin`
- Password: `${GRAFANA_ADMIN_PASSWORD}`
3. Dashboards are auto-provisioned from `monitoring/grafana/dashboards/`
### 3. Load Extensions
```bash
# List available extensions
curl http://localhost:8082/api/v1/extensions
# Upload extension (example)
curl -X POST http://localhost:8082/api/v1/extensions/upload \
-F "file=@my-extension.tar.gz"
```
### 4. Test Workflows
```bash
# Create test server (via orchestrator API)
curl -X POST http://localhost:8080/workflows/servers/create \
-H "Content-Type: application/json" \
-d '{"name": "test-server", "plan": "1xCPU-2GB"}'
# Check workflow status
curl http://localhost:8080/tasks/<task-id>
```
---
## Troubleshooting
### Common Issues
#### Services Not Starting
**Symptom**: `docker-compose up` fails or services crash
**Solutions**:
1. Check Docker daemon:
```bash
systemctl status docker
```
2. Check logs:
```bash
docker-compose logs orchestrator
```
3. Check resource limits:
```bash
docker stats
```
4. Increase Docker resources in Docker Desktop settings
#### Port Conflicts
**Symptom**: `Error: port is already allocated`
**Solutions**:
1. Find conflicting process:
```bash
lsof -i :8080
```
2. Change port in `.env`:
```bash
ORCHESTRATOR_PORT=9080
```
3. Restart deployment:
```bash
docker-compose down
docker-compose up -d
```
#### Health Checks Failing
**Symptom**: Health check script reports unhealthy services
**Solutions**:
1. Check service logs:
```bash
docker-compose logs -f <service>
```
2. Verify network connectivity:
```bash
docker network inspect provisioning-net
```
3. Check firewall rules:
```bash
sudo ufw status
```
4. Wait longer for services to start:
```bash
./scripts/deploy-platform.nu --wait 600
```
#### Database Connection Errors
**Symptom**: PostgreSQL connection refused
**Solutions**:
1. Check PostgreSQL health:
```bash
docker exec provisioning-postgres pg_isready
```
2. Verify credentials in `.env`:
```bash
grep POSTGRES_ .env
```
3. Check PostgreSQL logs:
```bash
docker-compose logs postgres
```
4. Recreate database:
```bash
docker-compose down
docker volume rm provisioning_postgres-data
docker-compose up -d
```
#### Out of Disk Space
**Symptom**: No space left on device
**Solutions**:
1. Clean Docker volumes:
```bash
docker volume prune
```
2. Clean Docker images:
```bash
docker image prune -a
```
3. Check volume sizes:
```bash
docker system df -v
```
### Getting Help
- **Logs**: Always check logs first: `docker-compose logs -f`
- **Health**: Run health check: `./scripts/health-check.nu --json`
- **Documentation**: See `docs/` directory
- **Issues**: File bug reports at GitHub repository
---
## Security Best Practices
### 1. Secret Management
- **Never commit** `.env` files to version control
- Use `./scripts/generate-secrets.nu` to generate strong secrets
- Rotate secrets regularly
- Use KMS in enterprise mode
### 2. Network Security
- Use TLS/SSL in production (enterprise mode)
- Configure firewall rules:
```bash
sudo ufw allow 80/tcp
sudo ufw allow 443/tcp
sudo ufw enable
```
- Use private networks for backend services
### 3. Access Control
- Enable authentication in multi-user mode
- Use strong passwords (16+ characters)
- Configure API keys for CI/CD access
- Enable audit logging in enterprise mode
### 4. Regular Updates
```bash
# Pull latest images
docker-compose pull
# Rebuild with updates
./scripts/deploy-platform.nu --pull --build
```
---
## Backup and Recovery
### Backup
```bash
# Backup volumes
docker run --rm -v provisioning_orchestrator-data:/data \
-v $(pwd)/backup:/backup \
alpine tar czf /backup/orchestrator-data.tar.gz -C /data .
# Backup PostgreSQL
docker exec provisioning-postgres pg_dumpall -U provisioning > backup/postgres-backup.sql
```
### Restore
```bash
# Restore volume
docker run --rm -v provisioning_orchestrator-data:/data \
-v $(pwd)/backup:/backup \
alpine tar xzf /backup/orchestrator-data.tar.gz -C /data
# Restore PostgreSQL
docker exec -i provisioning-postgres psql -U provisioning < backup/postgres-backup.sql
```
---
## Maintenance
### Updates
```bash
# Pull latest images
docker-compose pull
# Recreate containers
docker-compose up -d --force-recreate
# Remove old images
docker image prune
```
### Monitoring
- **Prometheus**: http://localhost:9090
- **Grafana**: http://localhost:3001
- **Logs**: `docker-compose logs -f`
### Health Checks
```bash
# Automated health check
./scripts/health-check.nu
# Manual checks
curl http://localhost:8080/health
curl http://localhost:8081/health
```
---
## Next Steps
- [Production Deployment Guide](./PRODUCTION_DEPLOYMENT.md)
- [Kubernetes Deployment Guide](./KUBERNETES_DEPLOYMENT.md)
- [Docker Compose Reference](./DOCKER_COMPOSE_REFERENCE.md)
- [Monitoring Setup](./MONITORING_SETUP.md)
- [Security Hardening](./SECURITY_HARDENING.md)
---
**Documentation Version**: 1.0.0
**Last Updated**: 2025-10-06
**Maintained By**: Platform Team