Vapora/provisioning/.github/GITHUB_ACTIONS_GUIDE.md
Jesús Pérez a395bd972f
Some checks failed
Rust CI / Security Audit (push) Has been cancelled
Rust CI / Check + Test + Lint (nightly) (push) Has been cancelled
Rust CI / Check + Test + Lint (stable) (push) Has been cancelled
mdBook Build & Deploy / Build mdBook (push) Has been cancelled
Nickel Type Check / Nickel Type Checking (push) Has been cancelled
mdBook Build & Deploy / Documentation Quality Check (push) Has been cancelled
mdBook Build & Deploy / Deploy to GitHub Pages (push) Has been cancelled
mdBook Build & Deploy / Notification (push) Has been cancelled
chore: add cd/ci ops
2026-01-12 03:36:55 +00:00

675 lines
16 KiB
Markdown

# GitHub Actions CI/CD Guide for VAPORA Provisioning
Complete guide for setting up and using GitHub Actions workflows for VAPORA deployment automation.
## Overview
Five integrated GitHub Actions workflows provide end-to-end CI/CD automation:
1. **validate-and-build.yml** - Configuration validation and artifact generation
2. **deploy-docker.yml** - Docker Compose deployment automation
3. **deploy-kubernetes.yml** - Kubernetes deployment automation
4. **health-check.yml** - Automated health monitoring and diagnostics
5. **rollback.yml** - Safe deployment rollback with pre-checks
---
## Quick Setup
### 1. Prerequisites
- GitHub repository with access to Actions
- Docker Hub account (for image pushes, optional)
- Kubernetes cluster with kubeconfig (for K8s deployments)
- Slack workspace (for notifications, optional)
### 2. Required Secrets
Add these secrets to your GitHub repository (Settings → Secrets → Actions):
```bash
# Kubeconfig for Kubernetes deployments
KUBE_CONFIG_CI # For CI/test cluster (optional)
KUBE_CONFIG_STAGING # For staging Kubernetes cluster
KUBE_CONFIG_PRODUCTION # For production Kubernetes cluster
# Optional: Slack notifications
SLACK_WEBHOOK # Default Slack webhook
SLACK_WEBHOOK_ALERTS # Critical alerts webhook
# Optional: Docker registry
DOCKER_USERNAME # Docker Hub username
DOCKER_PASSWORD # Docker Hub access token
```
### 3. Encode Kubeconfig for Secrets
```bash
# Convert kubeconfig to base64
cat ~/.kube/config | base64
# Store in GitHub Secrets as KUBE_CONFIG_STAGING, etc.
```
### 4. Enable GitHub Actions
1. Go to repository Settings
2. Click "Actions" → "General"
3. Enable "Allow all actions and reusable workflows"
4. Set "Workflow permissions" to "Read and write permissions"
---
## Workflows in Detail
### 1. Validate & Build (validate-and-build.yml)
**Purpose**: Validate all configurations and generate deployment artifacts
**Triggers**:
- Push to `main` or `develop` branches (if provisioning files change)
- Manual dispatch with custom mode selection
- Pull requests affecting provisioning
**Jobs**:
- `validate-configs` - Validates solo, multiuser, and enterprise modes
- `build-artifacts` - Generates JSON, TOML, YAML, and Kubernetes manifests
**Outputs**:
- `deployment-artifacts` - All configuration and manifest files
- `build-logs` - Pipeline execution logs
- `validation-logs-*` - Per-mode validation reports
**Usage**:
```bash
# Automatic on push
git commit -m "Update provisioning config"
git push origin main
# Manual trigger
# Go to Actions → Validate & Build → Run workflow
# Select mode: solo, multiuser, or enterprise
```
**Example Outputs**:
```
artifacts/
├── config-solo.json
├── config-multiuser.json
├── config-enterprise.json
├── vapora-solo.toml
├── vapora-multiuser.toml
├── vapora-enterprise.toml
├── vapora-solo.yaml
├── vapora-multiuser.yaml
├── vapora-enterprise.yaml
├── configmap.yaml
├── deployment.yaml
├── docker-compose.yml
└── MANIFEST.md
```
---
### 2. Deploy to Docker (deploy-docker.yml)
**Purpose**: Deploy VAPORA to Docker Compose
**Triggers**:
- Manual dispatch with configuration options
- Automatic trigger after validate-and-build on `develop` branch
**Required Inputs**:
- `mode` - Deployment mode (solo, multiuser, enterprise)
- `environment` - Target environment (development, staging, production)
- `dry_run` - Test without actual deployment
**Features**:
- Validates Docker Compose configuration
- Pulls base images
- Starts services
- Performs health checks
- Auto-comments on PRs with deployment details
- Slack notifications
**Usage**:
```bash
# Via GitHub UI
1. Go to Actions → Deploy to Docker
2. Click "Run workflow"
3. Select:
- Mode: multiuser
- Dry run: false
- Environment: staging
4. Click "Run workflow"
```
**Service Endpoints** (after deployment):
```
- Backend: http://localhost:8001
- Frontend: http://localhost:3000
- Agents: http://localhost:8002
- LLM Router: http://localhost:8003
- SurrealDB: http://localhost:8000
- Health: http://localhost:8001/health
```
**Local testing with same files**:
```bash
# Download artifacts from workflow
cd deploy/docker
docker compose up -d
# View logs
docker compose logs -f backend
# Check health
curl http://localhost:8001/health
```
---
### 3. Deploy to Kubernetes (deploy-kubernetes.yml)
**Purpose**: Deploy VAPORA to Kubernetes cluster
**Triggers**:
- Manual dispatch with full configuration options
- Workflow dispatch with environment selection
**Required Inputs**:
- `mode` - Deployment mode
- `environment` - Target environment (staging, production)
- `dry_run` - Dry-run test (recommended first)
- `rollout_timeout` - Max time to wait for rollout (default: 300s)
**Features**:
- Validates Kubernetes manifests
- Creates VAPORA namespace
- Applies ConfigMap with configuration
- Deploys all three services
- Waits for rollout completion
- Performs health checks
- Annotation tracking for deployments
- Slack notifications
**Usage**:
```bash
# Via GitHub UI
1. Go to Actions → Deploy to Kubernetes
2. Click "Run workflow"
3. Select:
- Mode: enterprise
- Environment: staging
- Dry run: true # Always test first!
- Rollout timeout: 300
4. Click "Run workflow"
# After dry-run verification, re-run with dry_run: false
```
**Deployment Steps**:
1. Validate manifests (dry-run)
2. Create vapora namespace
3. Apply ConfigMap
4. Apply Deployments
5. Wait for backend rollout (5m timeout)
6. Wait for agents rollout
7. Wait for llm-router rollout
8. Verify pod health
**Verification Commands**:
```bash
# Check deployments
kubectl get deployments -n vapora
kubectl get pods -n vapora
# View logs
kubectl logs -f deployment/vapora-backend -n vapora
# Check events
kubectl get events -n vapora --sort-by='.lastTimestamp'
# Port forward for local testing
kubectl port-forward -n vapora svc/vapora-backend 8001:8001
curl http://localhost:8001/health
# View rollout history
kubectl rollout history deployment/vapora-backend -n vapora
```
---
### 4. Health Check & Monitoring (health-check.yml)
**Purpose**: Continuous health monitoring across platforms
**Triggers**:
- Schedule: Every 15 minutes
- Schedule: Every 6 hours
- Manual dispatch with custom parameters
**Features**:
- Docker: Container status, HTTP health checks
- Kubernetes: Deployment replicas, pod phases, service health
- Automatic issue creation on failures
- Diagnostics collection
- Slack notifications
**Usage**:
```bash
# Via GitHub UI for manual run
1. Go to Actions → Health Check & Monitoring
2. Click "Run workflow"
3. Select:
- Target: kubernetes
- Count: 5 (run 5 checks)
- Interval: 30 (30 seconds between checks)
4. Click "Run workflow"
```
**Automatic Monitoring**:
- Every 15 minutes: Quick health check
- Every 6 hours: Comprehensive diagnostics
**What Gets Checked** (Kubernetes):
- Deployment replica status
- Pod readiness conditions
- Service availability
- ConfigMap data
- Recent events
- Resource usage (if metrics-server available)
**What Gets Checked** (Docker):
- Container status (Up/Down)
- HTTP endpoint health (200 status)
- Service responsiveness
- Docker network status
- Docker volumes
**Reports Generated**:
- `docker-health.log` - Docker health check output
- `k8s-health.log` - Kubernetes health check output
- `k8s-diagnostics.log` - Full K8s diagnostics
- `docker-diagnostics.log` - Full Docker diagnostics
- `HEALTH_REPORT.md` - Summary report
---
### 5. Rollback Deployment (rollback.yml)
**Purpose**: Safe deployment rollback with pre-checks and verification
**Triggers**:
- Manual dispatch only (safety feature)
**Required Inputs**:
- `target` - Rollback target (kubernetes or docker)
- `environment` - Environment to rollback (staging or production)
- `deployment` - Specific deployment or "all"
- `revision` - Kubernetes revision (0 = previous)
**Features**:
- Pre-rollback safety checks
- Deployment history snapshot
- Automatic rollback execution
- Post-rollback verification
- Health check after rollback
- GitHub issue creation with summary
- Slack alerts
**Usage** (Kubernetes):
```bash
# Via GitHub UI
1. Go to Actions → Rollback Deployment
2. Click "Run workflow"
3. Select:
- Target: kubernetes
- Environment: staging
- Deployment: all
- Revision: 0 (rollback to previous)
4. Click "Run workflow"
# To rollback to specific revision
# Check kubectl rollout history deployment/vapora-backend -n vapora
# Set revision to desired number instead of 0
```
**Usage** (Docker):
```bash
# Via GitHub UI
1. Go to Actions → Rollback Deployment
2. Click "Run workflow"
3. Select:
- Target: docker
- Environment: staging
4. Click "Run workflow"
# Follow the manual rollback guide in artifacts
```
**Rollback Process**:
1. Pre-rollback checks and snapshot
2. Store current deployment history
3. Execute rollback (automatic for K8s, guided for Docker)
4. Verify rollback status
5. Check pod health
6. Generate reports
7. Create GitHub issue
8. Send Slack alert
**Verification After Rollback**:
```bash
# Kubernetes
kubectl get pods -n vapora
kubectl logs -f deployment/vapora-backend -n vapora
curl http://localhost:8001/health # After port-forward
# Docker
docker compose ps
docker compose logs backend
curl http://localhost:8001/health
```
---
## CI/CD Pipelines & Common Workflows
### Workflow 1: Local Development
```
Developer creates feature branch
Push to GitHub
[Validate & Build] triggers automatically
Download artifacts
[Deploy to Docker] manually for local testing
Test locally with docker compose
Create PR (artifact links included)
Merge to develop when approved
```
### Workflow 2: Staging Deployment
```
Merge PR to develop
[Validate & Build] runs automatically
Download artifacts
Run [Deploy to Kubernetes] manually with dry-run
Review dry-run output
Run [Deploy to Kubernetes] again with dry-run: false
[Health Check] verifies deployment
Staging environment live
```
### Workflow 3: Production Deployment
```
Code review and approval
Merge PR to main
[Validate & Build] runs automatically
Manual approval for production
Run [Deploy to Kubernetes] with dry-run: true
Review changes carefully
Run [Deploy to Kubernetes] with dry-run: false
[Health Check] monitoring (automatic every 6 hours)
Production deployment complete
```
### Workflow 4: Emergency Rollback
```
Production issue detected
[Health Check] alerts in Slack
Investigate issue
Run [Rollback Deployment] manually
GitHub issue created automatically
[Health Check] verifies rollback
Services restored
Incident investigation begins
```
---
## Environment Configuration
### Staging Environment
- **Branch**: develop
- **Auto-deploy**: No (manual only)
- **Dry-run default**: Yes (test first)
- **Notifications**: SLACK_WEBHOOK
- **Protection**: Requires approval for merge to main
### Production Environment
- **Branch**: main
- **Auto-deploy**: No (manual only)
- **Dry-run default**: Yes (always test first)
- **Notifications**: SLACK_WEBHOOK_ALERTS
- **Protection**: Requires PR review, status checks must pass
---
## Artifacts & Downloads
All workflow artifacts are available in the Actions tab for 30-90 days:
```
Actions → [Specific Workflow] → [Run] → Artifacts
```
**Available Artifacts**:
- `deployment-artifacts` - Configuration and manifests
- `validation-logs-*` - Per-mode validation reports
- `build-logs` - CI/CD pipeline logs
- `docker-deployment-logs-*` - Docker deployment details
- `k8s-deployment-*` - Kubernetes deployment details
- `health-check-*` - Health monitoring reports
- `rollback-logs-*` - Rollback execution details
- `rollback-snapshot-*` - Pre-rollback state snapshot
---
## Troubleshooting
### Build Fails: "Config not found"
```
Solution: Ensure provisioning/schemas/ files exist and are committed
Check path references in validate-config.nu
```
### Deploy Fails: "kubeconfig not found"
```
Solution: 1. Verify KUBE_CONFIG_STAGING/PRODUCTION secrets exist
2. Ensure kubeconfig is properly base64 encoded
3. Test: echo $KUBE_CONFIG_STAGING | base64 -d
4. Re-encode if corrupted: cat ~/.kube/config | base64
```
### Health Check: "No kubeconfig available"
```
Solution: Configure at least KUBE_CONFIG_STAGING secret
Health check tries CI first, then falls back to staging
```
### Docker Deploy: "Docker daemon not accessible"
```
Solution: Docker is only available in ubuntu-latest runners
Run deploy-docker on appropriate runners
```
### Deployment Hangs: "Waiting for rollout"
```
Solution: 1. Check pod logs: kubectl logs -n vapora <pod>
2. Describe pod: kubectl describe pod -n vapora <pod>
3. Increase rollout_timeout in workflow
4. Check resource requests/limits in deployment.yaml
```
---
## Slack Integration
### Setup Slack Webhooks
1. Create Slack App: https://api.slack.com/apps
2. Enable Incoming Webhooks
3. Create webhook for #deployments channel
4. Copy webhook URL
5. Add to GitHub Secrets:
- `SLACK_WEBHOOK` - General notifications
- `SLACK_WEBHOOK_ALERTS` - Critical alerts
### Slack Message Examples
**Build Success**:
```
✅ VAPORA Artifact Build Complete
Mode: multiuser | Artifacts ready for deployment
```
**Deployment Success**:
```
✅ VAPORA Docker deployment successful!
Mode: multiuser | Environment: staging
```
**Health Check Alert**:
```
❌ VAPORA Health Check Failed
Target: kubernetes | Create issue for investigation
```
**Rollback Alert**:
```
🔙 VAPORA Rollback Executed
Target: kubernetes | Environment: production
Executed By: @user | Verify service health
```
---
## Security Best Practices
**Do**:
- Always use `--dry-run true` for Kubernetes first
- Review artifacts before production deployment
- Enable branch protection rules on main
- Use environment secrets (staging vs production)
- Require PR reviews before merge
- Monitor health checks after deployment
- Keep kubeconfig.backup safely stored
- Rotate secrets regularly
**Don't**:
- Commit secrets to repository
- Deploy directly to production without testing
- Disable workflow validation steps
- Skip health checks after deployment
- Use same kubeconfig for all environments
- Merge unreviewed PRs
- Change production without approval
- Share kubeconfig over unencrypted channels
---
## Monitoring & Alerts
### Automated Monitoring
- **Health checks**: Every 15 minutes
- **Comprehensive diagnostics**: Every 6 hours
- **Issue creation**: On health check failures
- **Slack alerts**: On critical failures
### Manual Monitoring
```bash
# Real-time logs
kubectl logs -f deployment/vapora-backend -n vapora
# Watch pods
kubectl get pods -n vapora --watch
# Metrics
kubectl top pods -n vapora
# Events
kubectl get events -n vapora --sort-by='.lastTimestamp'
```
---
## FAQ
**Q: Can I deploy multiple modes simultaneously?**
A: No, workflows serialize deployments. Deploy to staging first, then production.
**Q: How do I revert a failed deployment?**
A: Use the Rollback Deployment workflow. It automatically reverts to previous revision.
**Q: What if validation fails?**
A: Fix the configuration error and push again. Workflow will re-run automatically.
**Q: Can I skip health checks?**
A: No, health checks are mandatory for safety. They run automatically after each deployment.
**Q: How long do artifacts stay?**
A: 30-90 days depending on artifact type. Download and archive important ones.
**Q: What if kubeconfig expires?**
A: Update the secret in GitHub Settings → Secrets → Actions with new kubeconfig.
**Q: Can I deploy to multiple clusters?**
A: Yes, create separate secrets (KUBE_CONFIG_PROD_US, KUBE_CONFIG_PROD_EU) and workflows.
---
## Support & Documentation
- **Workflow Logs**: Actions → [Workflow Name] → [Run] → View logs
- **Artifacts**: Actions → [Workflow Name] → [Run] → Artifacts section
- **Issues**: GitHub Issues automatically created on failures
- **Slack**: Check #deployments channel for notifications
---
**Last Updated**: January 12, 2026
**Status**: Complete and production-ready
**Workflows**: 5 (validate-and-build, deploy-docker, deploy-kubernetes, health-check, rollback)