gemma3-vllm-stack
Production-ready self-hosted stack for running Gemma 3 with vLLM on AMD ROCm, plus a browser chat UI suitable for publishing at chat.bhatfamily.in.
What this stack provides
- Dockerized vLLM OpenAI-compatible API (
/v1) backed by Gemma 3 on ROCm. - Dockerized Open WebUI chat frontend connected to the local vLLM endpoint.
- Non-interactive scripts for install, restart, uninstall, and smoke testing.
- Documentation for operations, upgrades, and troubleshooting.
Repository layout
gemma3-vllm-stack/
├── .env.example
├── .gitignore
├── docker-compose.yml
├── README.md
├── backend/
│ ├── Dockerfile
│ └── config/
│ └── model.env.example
├── frontend/
│ ├── Dockerfile
│ └── config/
│ └── frontend.env.example
├── scripts/
│ ├── install.sh
│ ├── restart.sh
│ ├── test_api.sh
│ ├── test_python_client.py
│ ├── test_ui.sh
│ └── uninstall.sh
└── docs/
├── ARCHITECTURE.md
├── TROUBLESHOOTING.md
└── UPGRADE_NOTES.md
Architecture summary
gemma3-vllmservice runsvllm/vllm-openai-rocmand exposeshttp://localhost:${BACKEND_PORT}/v1.chat-uiservice runs Open WebUI and exposeshttp://localhost:${FRONTEND_PORT}.- Open WebUI calls
http://gemma3-vllm:8000/v1on the internal Docker network.
Detailed architecture: docs/ARCHITECTURE.md.
Prerequisites
- Ubuntu 22.04 LTS (amd64)
- AMD ROCm-compatible GPU setup with:
/dev/kfd/dev/dri
- Docker Engine and docker compose plugin (script auto-installs on Ubuntu if missing)
- Hugging Face token with access to Gemma 3 model (set as
HF_TOKEN)
Quickstart
-
Clone from your Gitea server:
git clone ssh://git@git.bhatfamily.in/rbhat/gemma3-vllm-stack.git cd gemma3-vllm-stack -
Create the main configuration file:
cp .env.example .env -
Edit
.envand set at least:HF_TOKENVLLM_API_KEY(recommended even on LAN)GEMMA_MODEL_ID
backend/config/model.envandfrontend/config/frontend.envare auto-synced from.envbyscripts/install.shandscripts/restart.sh. -
Install/start stack:
./scripts/install.sh -
Run smoke tests:
./scripts/test_api.sh ./scripts/test_ui.sh python3 scripts/test_python_client.py -
Open browser:
http://localhost:3000- Reverse proxy externally to
https://chat.bhatfamily.in
Operations
-
Restart stack:
./scripts/restart.sh -
View logs:
docker compose logs --tail=200 gemma3-vllm chat-ui -
Stop and remove stack resources:
./scripts/uninstall.sh -
Stop/remove stack and purge local cache/model/UI data:
./scripts/uninstall.sh --purge
Upgrade workflow
git pull
docker compose pull
./scripts/restart.sh
More details: docs/UPGRADE_NOTES.md.
Default endpoints
- API base URL:
http://localhost:8000/v1 - UI URL:
http://localhost:3000
Adjust using .env:
BACKEND_PORTFRONTEND_PORTGEMMA_MODEL_ID
Notes for chat.bhatfamily.in
This repository intentionally does not terminate TLS. Bindings are plain HTTP on host ports and are designed for external reverse proxy + TLS handling (nginx/Caddy/Cloudflare Tunnel).
Current homelab edge mapping (verified 2026-04-19):
https://chat.bhatfamily.inis served on443/tcpby the shared Caddy edge.- Direct host ports
3000/tcp(UI) and8000/tcp(API) are currently publicly reachable; restrict firewall/NAT exposure if direct internet access is not intended.