Initial production-ready Gemma 3 vLLM ROCm stack

Co-Authored-By: Oz <oz-agent@warp.dev>
2026-04-18 22:53:38 +05:30
commit ef8537e923
18 changed files with 988 additions and 0 deletions
--- a/README.md
+++ b/README.md
@ -0,0 +1,126 @@
+# gemma3-vllm-stack
+Production-ready self-hosted stack for running **Gemma 3** with **vLLM** on AMD ROCm, plus a browser chat UI suitable for publishing at `chat.bhatfamily.in`.
+
+## What this stack provides
+- Dockerized **vLLM OpenAI-compatible API** (`/v1`) backed by Gemma 3 on ROCm.
+- Dockerized **Open WebUI** chat frontend connected to the local vLLM endpoint.
+- Non-interactive scripts for install, restart, uninstall, and smoke testing.
+- Documentation for operations, upgrades, and troubleshooting.
+
+## Repository layout
+```text
+gemma3-vllm-stack/
+├── .env.example
+├── .gitignore
+├── docker-compose.yml
+├── README.md
+├── backend/
+│   ├── Dockerfile
+│   └── config/
+│       └── model.env.example
+├── frontend/
+│   ├── Dockerfile
+│   └── config/
+│       └── frontend.env.example
+├── scripts/
+│   ├── install.sh
+│   ├── restart.sh
+│   ├── test_api.sh
+│   ├── test_python_client.py
+│   ├── test_ui.sh
+│   └── uninstall.sh
+└── docs/
+    ├── ARCHITECTURE.md
+    ├── TROUBLESHOOTING.md
+    └── UPGRADE_NOTES.md
+```
+
+## Architecture summary
+- `gemma3-vllm` service runs `vllm/vllm-openai-rocm` and exposes `http://localhost:${BACKEND_PORT}/v1`.
+- `chat-ui` service runs Open WebUI and exposes `http://localhost:${FRONTEND_PORT}`.
+- Open WebUI calls `http://gemma3-vllm:8000/v1` on the internal Docker network.
+
+Detailed architecture: `docs/ARCHITECTURE.md`.
+
+## Prerequisites
+- Ubuntu 22.04 LTS (amd64)
+- AMD ROCm-compatible GPU setup with:
+  - `/dev/kfd`
+  - `/dev/dri`
+- Docker Engine and docker compose plugin (script auto-installs on Ubuntu if missing)
+- Hugging Face token with access to Gemma 3 model (set as `HF_TOKEN`)
+
+## Quickstart
+1. Clone from your Gitea server:
+   ```bash
+   git clone ssh://git@git.bhatfamily.in/rbhat/gemma3-vllm-stack.git
+   cd gemma3-vllm-stack
+   ```
+
+2. Create configuration files:
+   ```bash
+   cp .env.example .env
+   cp backend/config/model.env.example backend/config/model.env
+   cp frontend/config/frontend.env.example frontend/config/frontend.env
+   ```
+
+3. Edit `.env` and set at least:
+   - `HF_TOKEN`
+   - `VLLM_API_KEY` (recommended even on LAN)
+
+4. Install/start stack:
+   ```bash
+   ./scripts/install.sh
+   ```
+
+5. Run smoke tests:
+   ```bash
+   ./scripts/test_api.sh
+   ./scripts/test_ui.sh
+   python3 scripts/test_python_client.py
+   ```
+
+6. Open browser:
+   - `http://localhost:3000`
+   - Reverse proxy externally to `https://chat.bhatfamily.in`
+
+## Operations
+- Restart stack:
+  ```bash
+  ./scripts/restart.sh
+  ```
+
+- View logs:
+  ```bash
+  docker compose logs --tail=200 gemma3-vllm chat-ui
+  ```
+
+- Stop and remove stack resources:
+  ```bash
+  ./scripts/uninstall.sh
+  ```
+
+- Stop/remove stack and purge local cache/model/UI data:
+  ```bash
+  ./scripts/uninstall.sh --purge
+  ```
+
+## Upgrade workflow
+```bash
+git pull
+docker compose pull
+./scripts/restart.sh
+```
+More details: `docs/UPGRADE_NOTES.md`.
+
+## Default endpoints
+- API base URL: `http://localhost:8000/v1`
+- UI URL: `http://localhost:3000`
+
+Adjust using `.env`:
+- `BACKEND_PORT`
+- `FRONTEND_PORT`
+- `GEMMA_MODEL_ID`
+
+## Notes for `chat.bhatfamily.in`
+This repository intentionally does not terminate TLS. Bindings are plain HTTP on host ports and are designed for external reverse proxy + TLS handling (nginx/Caddy/Cloudflare Tunnel).