Initial production-ready Gemma 3 vLLM ROCm stack

Co-Authored-By: Oz <oz-agent@warp.dev>
This commit is contained in:
Raghav
2026-04-18 22:53:38 +05:30
commit ef8537e923
18 changed files with 988 additions and 0 deletions

72
docs/ARCHITECTURE.md Normal file
View File

@ -0,0 +1,72 @@
# Architecture
## Component flow
```text
[Browser @ chat.bhatfamily.in]
|
| HTTPS (terminated externally)
v
[Host reverse proxy (external to this repo)]
|
| HTTP -> localhost:3000
v
[chat-ui container: Open WebUI]
|
| HTTP (docker internal network)
v
[gemma3-vllm container: vLLM OpenAI API @ :8000/v1]
|
| reads model weights/cache
v
[Hugging Face cache + local models dir]
|
| ROCm runtime
v
[AMD Radeon 780M (RDNA3 iGPU) via /dev/kfd + /dev/dri]
```
## Services
### `gemma3-vllm`
- Image: `vllm/vllm-openai-rocm:latest`
- Purpose: Run Gemma 3 instruction model through OpenAI-compatible API.
- Host port mapping: `${BACKEND_PORT}:8000` (default `8000:8000`)
- Device passthrough:
- `/dev/kfd`
- `/dev/dri`
- Security/capabilities for ROCm debugging compatibility:
- `cap_add: SYS_PTRACE`
- `security_opt: seccomp=unconfined`
- `group_add: video`
### `chat-ui`
- Image: `ghcr.io/open-webui/open-webui:main`
- Purpose: Browser chat experience with local persistence in mounted data directory.
- Host port mapping: `${FRONTEND_PORT}:8080` (default `3000:8080`)
- Upstream model endpoint on docker network:
- `OPENAI_API_BASE_URL=http://gemma3-vllm:8000/v1`
## Networking
- Docker Compose default bridge network is used.
- `chat-ui` resolves `gemma3-vllm` by service name.
- External access is via host ports:
- API: `localhost:8000`
- UI: `localhost:3000`
## Storage
- Hugging Face cache bind mount:
- Host: `${HUGGINGFACE_CACHE_DIR}`
- Container: `/root/.cache/huggingface`
- Optional local models directory:
- Host: `./models`
- Container: `/models`
- Open WebUI data:
- Host: `${OPEN_WEBUI_DATA_DIR}`
- Container: `/app/backend/data`
## Scaling notes
This repository is designed for **single-node deployment** on one AMD APU/GPU host.
For larger deployments later:
- Move to dedicated GPUs with larger VRAM.
- Use pinned vLLM image tags and explicit engine tuning.
- Consider externalized model storage and distributed orchestration (Kubernetes/Swarm/Nomad).
- Add request routing, autoscaling, and centralized observability.

14
docs/README.md Normal file
View File

@ -0,0 +1,14 @@
# Documentation Index
This folder contains operational and lifecycle documentation for the `gemma3-vllm-stack` repository.
## Files
- `ARCHITECTURE.md`: Component topology, networking, runtime dependencies, and scaling notes.
- `TROUBLESHOOTING.md`: Common failures and copy-paste diagnostics/fixes for ROCm, Docker, vLLM, and UI issues.
- `UPGRADE_NOTES.md`: Safe upgrade, rollback, and backup guidance.
## Recommended reading order
1. `ARCHITECTURE.md`
2. `TROUBLESHOOTING.md`
3. `UPGRADE_NOTES.md`
For quick start and day-1 usage, use the repository root `README.md`.

172
docs/TROUBLESHOOTING.md Normal file
View File

@ -0,0 +1,172 @@
# Troubleshooting
## ROCm devices not visible in host
Symptoms:
- `/dev/kfd` missing
- `/dev/dri` missing
- vLLM fails to start with ROCm device errors
Checks:
```bash
ls -l /dev/kfd /dev/dri
id
getent group video
```
Expected:
- `/dev/kfd` exists
- `/dev/dri` directory exists
- user belongs to `video` group
Fixes:
```bash
sudo usermod -aG video "$USER"
newgrp video
```
Then verify ROCm tools:
```bash
rocminfo | sed -n '1,120p'
```
If ROCm is not healthy, fix host ROCm installation first.
---
## Docker and Compose not available
Symptoms:
- `docker: command not found`
- `docker compose version` fails
Checks:
```bash
docker --version
docker compose version
```
Fix using install script (Ubuntu):
```bash
./scripts/install.sh
```
Manual fallback:
```bash
sudo apt-get update
sudo apt-get install -y ca-certificates curl gnupg
sudo install -m 0755 -d /etc/apt/keyrings
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg
sudo chmod a+r /etc/apt/keyrings/docker.gpg
echo "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.gpg] https://download.docker.com/linux/ubuntu jammy stable" | sudo tee /etc/apt/sources.list.d/docker.list >/dev/null
sudo apt-get update
sudo apt-get install -y docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin
sudo usermod -aG docker "$USER"
```
Log out/in after group change.
---
## vLLM container exits or fails healthchecks
Symptoms:
- `gemma3-vllm` restarting
- API endpoint unavailable
Checks:
```bash
docker compose ps
docker compose logs --tail=200 gemma3-vllm
```
Common causes and fixes:
1. Missing/invalid Hugging Face token:
```bash
grep -E '^(HF_TOKEN|GEMMA_MODEL_ID)=' .env
```
Ensure `HF_TOKEN` is set to a valid token with access to Gemma 3.
2. Model ID typo:
```bash
grep '^GEMMA_MODEL_ID=' .env
```
Use a valid model, e.g. `google/gemma-3-1b-it`.
3. ROCm runtime/device issues:
```bash
docker run --rm --device=/dev/kfd --device=/dev/dri --group-add video ubuntu:22.04 bash -lc 'ls -l /dev/kfd /dev/dri'
```
4. API key mismatch between backend and UI/tests:
```bash
grep -E '^(VLLM_API_KEY|OPENAI_API_BASE_URL)=' .env frontend/config/frontend.env 2>/dev/null || true
```
Keep keys consistent.
---
## Out-of-memory (OOM) or low VRAM errors
Symptoms:
- startup failure referencing memory allocation
- runtime generation failures
Checks:
```bash
docker compose logs --tail=300 gemma3-vllm | grep -Ei 'out of memory|oom|memory|cuda|hip|rocm'
```
Mitigations:
1. Reduce context length in `.env`:
```bash
VLLM_MAX_MODEL_LEN=2048
```
2. Lower GPU memory utilization target:
```bash
VLLM_GPU_MEMORY_UTILIZATION=0.75
```
3. Use a smaller Gemma 3 variant in `.env`.
4. Restart stack:
```bash
./scripts/restart.sh
```
---
## UI loads but cannot reach vLLM backend
Symptoms:
- Browser opens UI but chat requests fail.
Checks:
```bash
docker compose ps
docker compose logs --tail=200 chat-ui
docker compose logs --tail=200 gemma3-vllm
```
Verify frontend backend URL:
```bash
grep -E '^OPENAI_API_BASE_URL=' frontend/config/frontend.env
```
Expected value:
```text
OPENAI_API_BASE_URL=http://gemma3-vllm:8000/v1
```
Verify API directly from host:
```bash
./scripts/test_api.sh
```
If API works from host but not UI, recreate frontend:
```bash
docker compose up -d --force-recreate chat-ui
```
---
## Health checks and endpoint validation
Run all smoke tests:
```bash
./scripts/test_api.sh
./scripts/test_ui.sh
python3 scripts/test_python_client.py
```
If one fails, inspect corresponding service logs and then restart:
```bash
docker compose logs --tail=200 gemma3-vllm chat-ui
./scripts/restart.sh
```

50
docs/UPGRADE_NOTES.md Normal file
View File

@ -0,0 +1,50 @@
# Upgrade Notes
## Standard safe upgrade path
From repository root:
```bash
git pull
docker compose pull
./scripts/restart.sh
```
Then run smoke tests:
```bash
./scripts/test_api.sh
./scripts/test_ui.sh
python3 scripts/test_python_client.py
```
## Versioning guidance
- Prefer pinning image tags in `docker-compose.yml` once your deployment is stable.
- Upgrading vLLM may change runtime defaults or engine behavior; check vLLM release notes before major version jumps.
- Keep `GEMMA_MODEL_ID` explicit in `.env` to avoid unintentional model drift.
## Model upgrade considerations
When changing Gemma 3 variants (for example, from 1B to larger sizes):
- Verify host RAM and GPU memory capacity.
- Expect re-download of model weights and larger disk usage.
- Re-tune:
- `VLLM_MAX_MODEL_LEN`
- `VLLM_GPU_MEMORY_UTILIZATION`
- Re-run validation scripts after restart.
## Backup recommendations
Before major upgrades, back up local persistent data:
```bash
mkdir -p backups
tar -czf backups/hf-cache-$(date +%Y%m%d-%H%M%S).tar.gz "${HOME}/.cache/huggingface"
tar -czf backups/open-webui-data-$(date +%Y%m%d-%H%M%S).tar.gz frontend/data/open-webui
```
If you use local predownloaded models:
```bash
tar -czf backups/models-$(date +%Y%m%d-%H%M%S).tar.gz models
```
## Rollback approach
If a new image/model combination fails:
1. Revert `docker-compose.yml` and `.env` to previous known-good values.
2. Pull previous pinned images (if pinned by tag/digest).
3. Restart:
```bash
./scripts/restart.sh
```
4. Re-run smoke tests.