# Workspace {{ workspace_name | title }} - Troubleshooting Guide **Purpose**: Common issues and solutions for {{ workspace_name }} deployment --- ## Authentication & Credentials ### Issue: "{{ primary_provider | title }} authentication failed" **Symptoms**: ``` Error: Authentication failed HTTP 401: Unauthorized Invalid credentials ``` **Solutions**: 1. **Verify credentials are set**: ```bash # Check {{ primary_provider | title }} credentials echo "Provider credentials configured" ``` 2. **Test with provider CLI**: ```bash {{ primary_provider | lower }} auth test ``` 3. **Re-set credentials**: ```bash {% for var, hint in provider_env_vars %} export {{ var }}="your-{{ hint | lower }}" {% endfor %} ``` 4. **Check provider account**: Log in to {{ provider_url }} --- ### Issue: "SSH key not found" **Symptoms**: ``` Error: SSH key not found Unable to load SSH public key Permission denied (publickey) ``` **Solutions**: 1. **Check SSH key**: ```bash ls -la ~/.ssh/id_deployment.pub ``` 2. **Generate SSH key**: ```bash ssh-keygen -t ed25519 -f ~/.ssh/id_deployment -C "provisioning@{{ workspace_name }}" ``` 3. **Update configuration** if path is different: ```nickel ssh_key_path = "~/.ssh/your-custom-key.pub" ``` --- ## Server Deployment ### Issue: "Server creation failed" **Symptoms**: ``` Error: Failed to create server Server creation timeout ``` **Causes**: - Provider quota exceeded - Insufficient account balance - Zone unavailable - Invalid configuration **Solutions**: 1. **Check provider quota**: Log in to provider console 2. **Verify zone availability**: `provisioning zone list` 3. **Check configuration**: `nickel export infra/{{ default_infra }}/main.ncl | jq .servers` 4. **Try dry-run first**: `provisioning -c server create --infra {{ default_infra }}` 5. **Check provider status**: {{ provider_status_url }} --- ### Issue: "SSH connection refused" **Symptoms**: ``` ssh: connect to host xxx.xxx.xxx.xxx port 22: Connection refused ``` **Causes**: - Server hasn't finished booting (takes 2-3 minutes) - SSH service not running - Firewall blocking access - Wrong IP address **Solutions**: 1. **Wait for server boot**: Wait 2-3 minutes 2. **Check server status**: `provisioning server list --infra {{ default_infra }}` 3. **Verify IP address**: `provisioning server list --infra {{ default_infra }} --format json | jq '.[] | {hostname, public_ip}'` 4. **Test SSH**: `provisioning server ssh {{ servers[0].name }} --command "uptime"` 5. **Check firewall rules**: In provider console, allow SSH (port 22) --- ## Configuration & Validation ### Issue: "Nickel type-check failed" **Symptoms**: ``` Error: Type checking failed Type mismatch ``` **Solutions**: 1. **Run type-check**: ```bash cd {{ workspace_path }} nickel typecheck config/config.ncl nickel typecheck infra/{{ default_infra }}/main.ncl ``` 2. **Check error message**: Fix the line mentioned in error 3. **Validate JSON export**: `nickel export config/config.ncl | jq .` 4. **Revert changes**: `git checkout config/config.ncl` --- ### Issue: "Workspace not found" **Symptoms**: ``` Error: Workspace {{ workspace_name }} not found ``` **Solutions**: 1. **Check if registered**: `provisioning workspace list` 2. **Register workspace**: `provisioning workspace register {{ workspace_name }} {{ workspace_path }}` 3. **Activate workspace**: `provisioning workspace activate {{ workspace_name }}:{{ default_infra }}` 4. **Verify**: `provisioning workspace active` --- ## Service Installation ### Issue: "Kubernetes installation failed" **Symptoms**: ``` Error: Failed to install Kubernetes kubectl: command not found ``` **Solutions**: 1. **Verify server resources**: ```bash ssh root@{{ servers[0].name }} "free -h" # Check memory ssh root@{{ servers[0].name }} "df -h" # Check disk ``` 2. **Check container runtime**: `ssh root@{{ servers[0].name }} "ctr version"` 3. **Increase server size if needed**: Edit `infra/{{ default_infra }}/servers.ncl` and change plan 4. **Retry installation**: `provisioning taskserv create kubernetes` --- ## Network Connectivity ### Issue: "Servers cannot communicate" **Symptoms**: ``` ping: ICMP timeout No route to host ``` **Solutions**: 1. **Verify private IP**: `provisioning server list --infra {{ default_infra }} --format json | jq '.[] | {hostname, private_ip}'` 2. **Test ping**: `ssh root@{{ servers[0].name }} "ping -c 3 "` 3. **Check firewall**: In provider console, verify firewall allows internal traffic 4. **Verify servers are in same zone**: `provisioning server list --infra {{ default_infra }} --format json | jq '.[] | {hostname, zone}'` --- ## Performance & Timeouts ### Issue: "Pricing command times out" **Symptoms**: `provisioning price --infra {{ default_infra }}` takes 30+ seconds **Solutions**: 1. **First run is slow**: Caches provider data 2. **Use cache for subsequent runs**: Cache at `data/{{ primary_provider | lower }}_prices.yaml` 3. **Refresh cache**: `rm -f data/*_cache.yaml && provisioning price --infra {{ default_infra }}` 4. **Check network**: `ping -c 5 {{ provider_api_host }}` --- ## Orchestrator Issues ### Issue: "Orchestrator health check fails" **Symptoms**: ``` curl http://localhost:9090/health curl: (7) Failed to connect to localhost port 9090 ``` **Solutions**: 1. **Check if running**: `ps aux | grep orchestrator` 2. **Check port 9090**: `lsof -i :9090` 3. **Check logs**: `tail -f provisioning/platform/orchestrator/data/orchestrator.log` 4. **Start manually**: `cd provisioning/platform/orchestrator && ./scripts/start-orchestrator.nu --background` 5. **Verify**: `curl http://localhost:9090/health` --- ## Cleanup Issues ### Issue: "Server deletion failed" **Symptoms**: ``` Error: Failed to delete server Server is locked or still in use ``` **Solutions**: 1. **Delete through provider console**: Log in and manually delete 2. **Check server state**: `provisioning server list --infra {{ default_infra }} --format json | jq '.[] | {hostname, status}'` 3. **Retry deletion**: `provisioning server delete --infra {{ default_infra }}` 4. **Clean cache**: `provisioning cache clean && rm -rf {{ workspace_path }}/.providers/*/state/*` 5. **Verify**: `provisioning server list --infra {{ default_infra }}` --- ## Debug Procedures ### Check Logs 1. **Orchestrator logs**: ```bash tail -f provisioning/platform/orchestrator/data/orchestrator.log ``` 2. **Enable debug mode**: ```bash export {{ primary_provider | upper }}_DEBUG=true provisioning --debug server create --infra {{ default_infra }} ``` 3. **Server logs**: ```bash ssh root@{{ servers[0].name }} "tail -f /var/log/syslog" ``` --- ## Diagnosis Checklist ``` [ ] Verify {{ primary_provider | title }} credentials [ ] Check SSH key exists [ ] Validate Nickel config [ ] Workspace registered [ ] Workspace activated [ ] Workspace validated [ ] Orchestrator running [ ] Dry-run passes [ ] Check provider status [ ] Review logs ``` --- ## Quick Reference **Common Commands**: ```bash # Validate provisioning validate config nu workspace.nu validate nickel typecheck infra/{{ default_infra }}/main.ncl # Deploy provisioning -c server create --infra {{ default_infra }} provisioning server create --infra {{ default_infra }} # Verify provisioning server list --infra {{ default_infra }} provisioning price --infra {{ default_infra }} # Debug provisioning --debug server create --infra {{ default_infra }} tail -f provisioning/platform/orchestrator/data/orchestrator.log ``` --- ## Next Steps After troubleshooting: 1. Validate configuration again 2. Perform dry-run deployment 3. Deploy infrastructure 4. Verify deployment health For more information: - Deployment Guide: `deployment-guide.md` - Configuration Guide: `configuration-guide.md`