333 lines
7.7 KiB
Plaintext
333 lines
7.7 KiB
Plaintext
|
|
# Workspace {{ workspace_name | title }} - Troubleshooting Guide
|
||
|
|
|
||
|
|
**Purpose**: Common issues and solutions for {{ workspace_name }} deployment
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Authentication & Credentials
|
||
|
|
|
||
|
|
### Issue: "{{ primary_provider | title }} authentication failed"
|
||
|
|
|
||
|
|
**Symptoms**:
|
||
|
|
```
|
||
|
|
Error: Authentication failed
|
||
|
|
HTTP 401: Unauthorized
|
||
|
|
Invalid credentials
|
||
|
|
```
|
||
|
|
|
||
|
|
**Solutions**:
|
||
|
|
|
||
|
|
1. **Verify credentials are set**:
|
||
|
|
```bash
|
||
|
|
# Check {{ primary_provider | title }} credentials
|
||
|
|
echo "Provider credentials configured"
|
||
|
|
```
|
||
|
|
|
||
|
|
2. **Test with provider CLI**:
|
||
|
|
```bash
|
||
|
|
{{ primary_provider | lower }} auth test
|
||
|
|
```
|
||
|
|
|
||
|
|
3. **Re-set credentials**:
|
||
|
|
```bash
|
||
|
|
{% for var, hint in provider_env_vars %}
|
||
|
|
export {{ var }}="your-{{ hint | lower }}"
|
||
|
|
{% endfor %}
|
||
|
|
```
|
||
|
|
|
||
|
|
4. **Check provider account**: Log in to {{ provider_url }}
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
### Issue: "SSH key not found"
|
||
|
|
|
||
|
|
**Symptoms**:
|
||
|
|
```
|
||
|
|
Error: SSH key not found
|
||
|
|
Unable to load SSH public key
|
||
|
|
Permission denied (publickey)
|
||
|
|
```
|
||
|
|
|
||
|
|
**Solutions**:
|
||
|
|
|
||
|
|
1. **Check SSH key**:
|
||
|
|
```bash
|
||
|
|
ls -la ~/.ssh/id_deployment.pub
|
||
|
|
```
|
||
|
|
|
||
|
|
2. **Generate SSH key**:
|
||
|
|
```bash
|
||
|
|
ssh-keygen -t ed25519 -f ~/.ssh/id_deployment -C "provisioning@{{ workspace_name }}"
|
||
|
|
```
|
||
|
|
|
||
|
|
3. **Update configuration** if path is different:
|
||
|
|
```nickel
|
||
|
|
ssh_key_path = "~/.ssh/your-custom-key.pub"
|
||
|
|
```
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Server Deployment
|
||
|
|
|
||
|
|
### Issue: "Server creation failed"
|
||
|
|
|
||
|
|
**Symptoms**:
|
||
|
|
```
|
||
|
|
Error: Failed to create server
|
||
|
|
Server creation timeout
|
||
|
|
```
|
||
|
|
|
||
|
|
**Causes**:
|
||
|
|
- Provider quota exceeded
|
||
|
|
- Insufficient account balance
|
||
|
|
- Zone unavailable
|
||
|
|
- Invalid configuration
|
||
|
|
|
||
|
|
**Solutions**:
|
||
|
|
|
||
|
|
1. **Check provider quota**: Log in to provider console
|
||
|
|
2. **Verify zone availability**: `provisioning zone list`
|
||
|
|
3. **Check configuration**: `nickel export infra/{{ default_infra }}/main.ncl | jq .servers`
|
||
|
|
4. **Try dry-run first**: `provisioning -c server create --infra {{ default_infra }}`
|
||
|
|
5. **Check provider status**: {{ provider_status_url }}
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
### Issue: "SSH connection refused"
|
||
|
|
|
||
|
|
**Symptoms**:
|
||
|
|
```
|
||
|
|
ssh: connect to host xxx.xxx.xxx.xxx port 22: Connection refused
|
||
|
|
```
|
||
|
|
|
||
|
|
**Causes**:
|
||
|
|
- Server hasn't finished booting (takes 2-3 minutes)
|
||
|
|
- SSH service not running
|
||
|
|
- Firewall blocking access
|
||
|
|
- Wrong IP address
|
||
|
|
|
||
|
|
**Solutions**:
|
||
|
|
|
||
|
|
1. **Wait for server boot**: Wait 2-3 minutes
|
||
|
|
2. **Check server status**: `provisioning server list --infra {{ default_infra }}`
|
||
|
|
3. **Verify IP address**: `provisioning server list --infra {{ default_infra }} --format json | jq '.[] | {hostname, public_ip}'`
|
||
|
|
4. **Test SSH**: `provisioning server ssh {{ servers[0].name }} --command "uptime"`
|
||
|
|
5. **Check firewall rules**: In provider console, allow SSH (port 22)
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Configuration & Validation
|
||
|
|
|
||
|
|
### Issue: "Nickel type-check failed"
|
||
|
|
|
||
|
|
**Symptoms**:
|
||
|
|
```
|
||
|
|
Error: Type checking failed
|
||
|
|
Type mismatch
|
||
|
|
```
|
||
|
|
|
||
|
|
**Solutions**:
|
||
|
|
|
||
|
|
1. **Run type-check**:
|
||
|
|
```bash
|
||
|
|
cd {{ workspace_path }}
|
||
|
|
nickel typecheck config/config.ncl
|
||
|
|
nickel typecheck infra/{{ default_infra }}/main.ncl
|
||
|
|
```
|
||
|
|
|
||
|
|
2. **Check error message**: Fix the line mentioned in error
|
||
|
|
3. **Validate JSON export**: `nickel export config/config.ncl | jq .`
|
||
|
|
4. **Revert changes**: `git checkout config/config.ncl`
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
### Issue: "Workspace not found"
|
||
|
|
|
||
|
|
**Symptoms**:
|
||
|
|
```
|
||
|
|
Error: Workspace {{ workspace_name }} not found
|
||
|
|
```
|
||
|
|
|
||
|
|
**Solutions**:
|
||
|
|
|
||
|
|
1. **Check if registered**: `provisioning workspace list`
|
||
|
|
2. **Register workspace**: `provisioning workspace register {{ workspace_name }} {{ workspace_path }}`
|
||
|
|
3. **Activate workspace**: `provisioning workspace activate {{ workspace_name }}:{{ default_infra }}`
|
||
|
|
4. **Verify**: `provisioning workspace active`
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Service Installation
|
||
|
|
|
||
|
|
### Issue: "Kubernetes installation failed"
|
||
|
|
|
||
|
|
**Symptoms**:
|
||
|
|
```
|
||
|
|
Error: Failed to install Kubernetes
|
||
|
|
kubectl: command not found
|
||
|
|
```
|
||
|
|
|
||
|
|
**Solutions**:
|
||
|
|
|
||
|
|
1. **Verify server resources**:
|
||
|
|
```bash
|
||
|
|
ssh root@{{ servers[0].name }} "free -h" # Check memory
|
||
|
|
ssh root@{{ servers[0].name }} "df -h" # Check disk
|
||
|
|
```
|
||
|
|
|
||
|
|
2. **Check container runtime**: `ssh root@{{ servers[0].name }} "ctr version"`
|
||
|
|
3. **Increase server size if needed**: Edit `infra/{{ default_infra }}/servers.ncl` and change plan
|
||
|
|
4. **Retry installation**: `provisioning taskserv create kubernetes`
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Network Connectivity
|
||
|
|
|
||
|
|
### Issue: "Servers cannot communicate"
|
||
|
|
|
||
|
|
**Symptoms**:
|
||
|
|
```
|
||
|
|
ping: ICMP timeout
|
||
|
|
No route to host
|
||
|
|
```
|
||
|
|
|
||
|
|
**Solutions**:
|
||
|
|
|
||
|
|
1. **Verify private IP**: `provisioning server list --infra {{ default_infra }} --format json | jq '.[] | {hostname, private_ip}'`
|
||
|
|
2. **Test ping**: `ssh root@{{ servers[0].name }} "ping -c 3 <private-ip>"`
|
||
|
|
3. **Check firewall**: In provider console, verify firewall allows internal traffic
|
||
|
|
4. **Verify servers are in same zone**: `provisioning server list --infra {{ default_infra }} --format json | jq '.[] | {hostname, zone}'`
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Performance & Timeouts
|
||
|
|
|
||
|
|
### Issue: "Pricing command times out"
|
||
|
|
|
||
|
|
**Symptoms**: `provisioning price --infra {{ default_infra }}` takes 30+ seconds
|
||
|
|
|
||
|
|
**Solutions**:
|
||
|
|
|
||
|
|
1. **First run is slow**: Caches provider data
|
||
|
|
2. **Use cache for subsequent runs**: Cache at `data/{{ primary_provider | lower }}_prices.yaml`
|
||
|
|
3. **Refresh cache**: `rm -f data/*_cache.yaml && provisioning price --infra {{ default_infra }}`
|
||
|
|
4. **Check network**: `ping -c 5 {{ provider_api_host }}`
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Orchestrator Issues
|
||
|
|
|
||
|
|
### Issue: "Orchestrator health check fails"
|
||
|
|
|
||
|
|
**Symptoms**:
|
||
|
|
```
|
||
|
|
curl http://localhost:9090/health
|
||
|
|
curl: (7) Failed to connect to localhost port 9090
|
||
|
|
```
|
||
|
|
|
||
|
|
**Solutions**:
|
||
|
|
|
||
|
|
1. **Check if running**: `ps aux | grep orchestrator`
|
||
|
|
2. **Check port 9090**: `lsof -i :9090`
|
||
|
|
3. **Check logs**: `tail -f provisioning/platform/orchestrator/data/orchestrator.log`
|
||
|
|
4. **Start manually**: `cd provisioning/platform/orchestrator && ./scripts/start-orchestrator.nu --background`
|
||
|
|
5. **Verify**: `curl http://localhost:9090/health`
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Cleanup Issues
|
||
|
|
|
||
|
|
### Issue: "Server deletion failed"
|
||
|
|
|
||
|
|
**Symptoms**:
|
||
|
|
```
|
||
|
|
Error: Failed to delete server
|
||
|
|
Server is locked or still in use
|
||
|
|
```
|
||
|
|
|
||
|
|
**Solutions**:
|
||
|
|
|
||
|
|
1. **Delete through provider console**: Log in and manually delete
|
||
|
|
2. **Check server state**: `provisioning server list --infra {{ default_infra }} --format json | jq '.[] | {hostname, status}'`
|
||
|
|
3. **Retry deletion**: `provisioning server delete --infra {{ default_infra }}`
|
||
|
|
4. **Clean cache**: `provisioning cache clean && rm -rf {{ workspace_path }}/.providers/*/state/*`
|
||
|
|
5. **Verify**: `provisioning server list --infra {{ default_infra }}`
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Debug Procedures
|
||
|
|
|
||
|
|
### Check Logs
|
||
|
|
|
||
|
|
1. **Orchestrator logs**:
|
||
|
|
```bash
|
||
|
|
tail -f provisioning/platform/orchestrator/data/orchestrator.log
|
||
|
|
```
|
||
|
|
|
||
|
|
2. **Enable debug mode**:
|
||
|
|
```bash
|
||
|
|
export {{ primary_provider | upper }}_DEBUG=true
|
||
|
|
provisioning --debug server create --infra {{ default_infra }}
|
||
|
|
```
|
||
|
|
|
||
|
|
3. **Server logs**:
|
||
|
|
```bash
|
||
|
|
ssh root@{{ servers[0].name }} "tail -f /var/log/syslog"
|
||
|
|
```
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Diagnosis Checklist
|
||
|
|
|
||
|
|
```
|
||
|
|
[ ] Verify {{ primary_provider | title }} credentials
|
||
|
|
[ ] Check SSH key exists
|
||
|
|
[ ] Validate Nickel config
|
||
|
|
[ ] Workspace registered
|
||
|
|
[ ] Workspace activated
|
||
|
|
[ ] Workspace validated
|
||
|
|
[ ] Orchestrator running
|
||
|
|
[ ] Dry-run passes
|
||
|
|
[ ] Check provider status
|
||
|
|
[ ] Review logs
|
||
|
|
```
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Quick Reference
|
||
|
|
|
||
|
|
**Common Commands**:
|
||
|
|
|
||
|
|
```bash
|
||
|
|
# Validate
|
||
|
|
provisioning validate config
|
||
|
|
nu workspace.nu validate
|
||
|
|
nickel typecheck infra/{{ default_infra }}/main.ncl
|
||
|
|
|
||
|
|
# Deploy
|
||
|
|
provisioning -c server create --infra {{ default_infra }}
|
||
|
|
provisioning server create --infra {{ default_infra }}
|
||
|
|
|
||
|
|
# Verify
|
||
|
|
provisioning server list --infra {{ default_infra }}
|
||
|
|
provisioning price --infra {{ default_infra }}
|
||
|
|
|
||
|
|
# Debug
|
||
|
|
provisioning --debug server create --infra {{ default_infra }}
|
||
|
|
tail -f provisioning/platform/orchestrator/data/orchestrator.log
|
||
|
|
```
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Next Steps
|
||
|
|
|
||
|
|
After troubleshooting:
|
||
|
|
1. Validate configuration again
|
||
|
|
2. Perform dry-run deployment
|
||
|
|
3. Deploy infrastructure
|
||
|
|
4. Verify deployment health
|
||
|
|
|
||
|
|
For more information:
|
||
|
|
- Deployment Guide: `deployment-guide.md`
|
||
|
|
- Configuration Guide: `configuration-guide.md`
|