# gemma3-vllm-stack Production-ready self-hosted stack for running **Gemma 3** with **vLLM** on AMD ROCm, plus a browser chat UI suitable for publishing at `chat.bhatfamily.in`. ## What this stack provides - Dockerized **vLLM OpenAI-compatible API** (`/v1`) backed by Gemma 3 on ROCm. - Dockerized **Open WebUI** chat frontend connected to the local vLLM endpoint. - Non-interactive scripts for install, restart, uninstall, and smoke testing. - Documentation for operations, upgrades, and troubleshooting. ## Repository layout ```text gemma3-vllm-stack/ ├── .env.example ├── .gitignore ├── docker-compose.yml ├── README.md ├── backend/ │ ├── Dockerfile │ └── config/ │ └── model.env.example ├── frontend/ │ ├── Dockerfile │ └── config/ │ └── frontend.env.example ├── scripts/ │ ├── install.sh │ ├── restart.sh │ ├── test_api.sh │ ├── test_python_client.py │ ├── test_ui.sh │ └── uninstall.sh └── docs/ ├── ARCHITECTURE.md ├── TROUBLESHOOTING.md └── UPGRADE_NOTES.md ``` ## Architecture summary - `gemma3-vllm` service runs `vllm/vllm-openai-rocm` and exposes `http://localhost:${BACKEND_PORT}/v1`. - `chat-ui` service runs Open WebUI and exposes `http://localhost:${FRONTEND_PORT}`. - Open WebUI calls `http://gemma3-vllm:8000/v1` on the internal Docker network. Detailed architecture: `docs/ARCHITECTURE.md`. ## Prerequisites - Ubuntu 22.04 LTS (amd64) - AMD ROCm-compatible GPU setup with: - `/dev/kfd` - `/dev/dri` - Docker Engine and docker compose plugin (script auto-installs on Ubuntu if missing) - Hugging Face token with access to Gemma 3 model (set as `HF_TOKEN`) ## Quickstart 1. Clone from your Gitea server: ```bash git clone ssh://git@git.bhatfamily.in/rbhat/gemma3-vllm-stack.git cd gemma3-vllm-stack ``` 2. Create the main configuration file: ```bash cp .env.example .env ``` 3. Edit `.env` and set at least: - `HF_TOKEN` - `VLLM_API_KEY` (recommended even on LAN) - `GEMMA_MODEL_ID` `backend/config/model.env` and `frontend/config/frontend.env` are auto-synced from `.env` by `scripts/install.sh` and `scripts/restart.sh`. 4. Install/start stack: ```bash ./scripts/install.sh ``` 5. Run smoke tests: ```bash ./scripts/test_api.sh ./scripts/test_ui.sh python3 scripts/test_python_client.py ``` 6. Open browser: - `http://localhost:3000` - Reverse proxy externally to `https://chat.bhatfamily.in` ## Operations - Restart stack: ```bash ./scripts/restart.sh ``` - View logs: ```bash docker compose logs --tail=200 gemma3-vllm chat-ui ``` - Stop and remove stack resources: ```bash ./scripts/uninstall.sh ``` - Stop/remove stack and purge local cache/model/UI data: ```bash ./scripts/uninstall.sh --purge ``` ## Upgrade workflow ```bash git pull docker compose pull ./scripts/restart.sh ``` More details: `docs/UPGRADE_NOTES.md`. ## Default endpoints - API base URL: `http://localhost:8000/v1` - UI URL: `http://localhost:3000` Adjust using `.env`: - `BACKEND_PORT` - `FRONTEND_PORT` - `GEMMA_MODEL_ID` ## Notes for `chat.bhatfamily.in` This repository intentionally does not terminate TLS. Bindings are plain HTTP on host ports and are designed for external reverse proxy + TLS handling (nginx/Caddy/Cloudflare Tunnel). Current homelab edge mapping (verified 2026-04-19): - `https://chat.bhatfamily.in` is served on `443/tcp` by the shared Caddy edge. - Direct host ports `3000/tcp` (UI) and `8000/tcp` (API) are currently publicly reachable; restrict firewall/NAT exposure if direct internet access is not intended.