Production-ready self-hosted stack for running Gemma 3 with vLLM on AMD ROCm, plus a browser chat UI suitable for publishing at chat.bhatfamily.in.

What this stack provides

Dockerized vLLM OpenAI-compatible API (/v1) backed by Gemma 3 on ROCm.
Dockerized Open WebUI chat frontend connected to the local vLLM endpoint.
Non-interactive scripts for install, restart, uninstall, and smoke testing.
Documentation for operations, upgrades, and troubleshooting.

Repository layout

gemma3-vllm-stack/
├── .env.example
├── .gitignore
├── docker-compose.yml
├── README.md
├── backend/
│   ├── Dockerfile
│   └── config/
│       └── model.env.example
├── frontend/
│   ├── Dockerfile
│   └── config/
│       └── frontend.env.example
├── scripts/
│   ├── install.sh
│   ├── restart.sh
│   ├── test_api.sh
│   ├── test_python_client.py
│   ├── test_ui.sh
│   └── uninstall.sh
└── docs/
    ├── ARCHITECTURE.md
    ├── TROUBLESHOOTING.md
    └── UPGRADE_NOTES.md

Architecture summary

gemma3-vllm service runs vllm/vllm-openai-rocm and exposes http://localhost:${BACKEND_PORT}/v1.
chat-ui service runs Open WebUI and exposes http://localhost:${FRONTEND_PORT}.
Open WebUI calls http://gemma3-vllm:8000/v1 on the internal Docker network.

Detailed architecture: docs/ARCHITECTURE.md.

Prerequisites

Ubuntu 22.04 LTS (amd64)
AMD ROCm-compatible GPU setup with:
- /dev/kfd
- /dev/dri
Docker Engine and docker compose plugin (script auto-installs on Ubuntu if missing)
Hugging Face token with access to Gemma 3 model (set as HF_TOKEN)

Quickstart

Clone from your Gitea server:

git clone ssh://git@git.bhatfamily.in/rbhat/gemma3-vllm-stack.git
cd gemma3-vllm-stack

Create configuration files:

cp .env.example .env
cp backend/config/model.env.example backend/config/model.env
cp frontend/config/frontend.env.example frontend/config/frontend.env

Edit .env and set at least:
- HF_TOKEN
- VLLM_API_KEY (recommended even on LAN)
Install/start stack:
```
./scripts/install.sh
```

Run smoke tests:

./scripts/test_api.sh
./scripts/test_ui.sh
python3 scripts/test_python_client.py

Open browser:
- http://localhost:3000
- Reverse proxy externally to https://chat.bhatfamily.in

Operations

Restart stack:
```
./scripts/restart.sh
```

View logs:

docker compose logs --tail=200 gemma3-vllm chat-ui

Stop and remove stack resources:
```
./scripts/uninstall.sh
```
Stop/remove stack and purge local cache/model/UI data:
```
./scripts/uninstall.sh --purge
```

Upgrade workflow

git pull
docker compose pull
./scripts/restart.sh

More details: docs/UPGRADE_NOTES.md.

Default endpoints

API base URL: http://localhost:8000/v1
UI URL: http://localhost:3000

Adjust using .env:

BACKEND_PORT
FRONTEND_PORT
GEMMA_MODEL_ID

Notes for `chat.bhatfamily.in`

This repository intentionally does not terminate TLS. Bindings are plain HTTP on host ports and are designed for external reverse proxy + TLS handling (nginx/Caddy/Cloudflare Tunnel).

README.md

gemma3-vllm-stack