3.6 KiB
3.6 KiB
Troubleshooting
ROCm devices not visible in host
Symptoms:
/dev/kfdmissing/dev/drimissing- vLLM fails to start with ROCm device errors
Checks:
ls -l /dev/kfd /dev/dri
id
getent group video
Expected:
/dev/kfdexists/dev/dridirectory exists- user belongs to
videogroup
Fixes:
sudo usermod -aG video "$USER"
newgrp video
Then verify ROCm tools:
rocminfo | sed -n '1,120p'
If ROCm is not healthy, fix host ROCm installation first.
Docker and Compose not available
Symptoms:
docker: command not founddocker compose versionfails
Checks:
docker --version
docker compose version
Fix using install script (Ubuntu):
./scripts/install.sh
Manual fallback:
sudo apt-get update
sudo apt-get install -y ca-certificates curl gnupg
sudo install -m 0755 -d /etc/apt/keyrings
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg
sudo chmod a+r /etc/apt/keyrings/docker.gpg
echo "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.gpg] https://download.docker.com/linux/ubuntu jammy stable" | sudo tee /etc/apt/sources.list.d/docker.list >/dev/null
sudo apt-get update
sudo apt-get install -y docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin
sudo usermod -aG docker "$USER"
Log out/in after group change.
vLLM container exits or fails healthchecks
Symptoms:
gemma3-vllmrestarting- API endpoint unavailable
Checks:
docker compose ps
docker compose logs --tail=200 gemma3-vllm
Common causes and fixes:
- Missing/invalid Hugging Face token:
grep -E '^(HF_TOKEN|GEMMA_MODEL_ID)=' .env
Ensure HF_TOKEN is set to a valid token with access to Gemma 3.
- Model ID typo:
grep '^GEMMA_MODEL_ID=' .env
Use a valid model, e.g. google/gemma-3-1b-it.
- ROCm runtime/device issues:
docker run --rm --device=/dev/kfd --device=/dev/dri --group-add video ubuntu:22.04 bash -lc 'ls -l /dev/kfd /dev/dri'
- API key mismatch between backend and UI/tests:
grep -E '^(VLLM_API_KEY|OPENAI_API_BASE_URL)=' .env frontend/config/frontend.env 2>/dev/null || true
Keep keys consistent.
Out-of-memory (OOM) or low VRAM errors
Symptoms:
- startup failure referencing memory allocation
- runtime generation failures
Checks:
docker compose logs --tail=300 gemma3-vllm | grep -Ei 'out of memory|oom|memory|cuda|hip|rocm'
Mitigations:
- Reduce context length in
.env:
VLLM_MAX_MODEL_LEN=2048
- Lower GPU memory utilization target:
VLLM_GPU_MEMORY_UTILIZATION=0.75
- Use a smaller Gemma 3 variant in
.env. - Restart stack:
./scripts/restart.sh
UI loads but cannot reach vLLM backend
Symptoms:
- Browser opens UI but chat requests fail.
Checks:
docker compose ps
docker compose logs --tail=200 chat-ui
docker compose logs --tail=200 gemma3-vllm
Verify frontend backend URL:
grep -E '^OPENAI_API_BASE_URL=' frontend/config/frontend.env
Expected value:
OPENAI_API_BASE_URL=http://gemma3-vllm:8000/v1
Verify API directly from host:
./scripts/test_api.sh
If API works from host but not UI, recreate frontend:
docker compose up -d --force-recreate chat-ui
Health checks and endpoint validation
Run all smoke tests:
./scripts/test_api.sh
./scripts/test_ui.sh
python3 scripts/test_python_client.py
If one fails, inspect corresponding service logs and then restart:
docker compose logs --tail=200 gemma3-vllm chat-ui
./scripts/restart.sh