gemma3-vllm-stack

Production-ready self-hosted stack for running Gemma 3 with vLLM on AMD ROCm, plus a browser chat UI suitable for publishing at chat.bhatfamily.in.

What this stack provides

  • Dockerized vLLM OpenAI-compatible API (/v1) backed by Gemma 3 on ROCm.
  • Dockerized Open WebUI chat frontend connected to the local vLLM endpoint.
  • Non-interactive scripts for install, restart, uninstall, and smoke testing.
  • Documentation for operations, upgrades, and troubleshooting.

Repository layout

gemma3-vllm-stack/
├── .env.example
├── .gitignore
├── docker-compose.yml
├── README.md
├── backend/
│   ├── Dockerfile
│   └── config/
│       └── model.env.example
├── frontend/
│   ├── Dockerfile
│   └── config/
│       └── frontend.env.example
├── scripts/
│   ├── install.sh
│   ├── restart.sh
│   ├── test_api.sh
│   ├── test_python_client.py
│   ├── test_ui.sh
│   └── uninstall.sh
└── docs/
    ├── ARCHITECTURE.md
    ├── TROUBLESHOOTING.md
    └── UPGRADE_NOTES.md

Architecture summary

  • gemma3-vllm service runs vllm/vllm-openai-rocm and exposes http://localhost:${BACKEND_PORT}/v1.
  • chat-ui service runs Open WebUI and exposes http://localhost:${FRONTEND_PORT}.
  • Open WebUI calls http://gemma3-vllm:8000/v1 on the internal Docker network.

Detailed architecture: docs/ARCHITECTURE.md.

Prerequisites

  • Ubuntu 22.04 LTS (amd64)
  • AMD ROCm-compatible GPU setup with:
    • /dev/kfd
    • /dev/dri
  • Docker Engine and docker compose plugin (script auto-installs on Ubuntu if missing)
  • Hugging Face token with access to Gemma 3 model (set as HF_TOKEN)

Quickstart

  1. Clone from your Gitea server:

    git clone ssh://git@git.bhatfamily.in/rbhat/gemma3-vllm-stack.git
    cd gemma3-vllm-stack
    
  2. Create the main configuration file:

    cp .env.example .env
    
  3. Edit .env and set at least:

    • HF_TOKEN
    • VLLM_API_KEY (recommended even on LAN)
    • GEMMA_MODEL_ID

    backend/config/model.env and frontend/config/frontend.env are auto-synced from .env by scripts/install.sh and scripts/restart.sh.

  4. Install/start stack:

    ./scripts/install.sh
    
  5. Run smoke tests:

    ./scripts/test_api.sh
    ./scripts/test_ui.sh
    python3 scripts/test_python_client.py
    
  6. Open browser:

    • http://localhost:3000
    • Reverse proxy externally to https://chat.bhatfamily.in

Operations

  • Restart stack:

    ./scripts/restart.sh
    
  • View logs:

    docker compose logs --tail=200 gemma3-vllm chat-ui
    
  • Stop and remove stack resources:

    ./scripts/uninstall.sh
    
  • Stop/remove stack and purge local cache/model/UI data:

    ./scripts/uninstall.sh --purge
    

Upgrade workflow

git pull
docker compose pull
./scripts/restart.sh

More details: docs/UPGRADE_NOTES.md.

Default endpoints

  • API base URL: http://localhost:8000/v1
  • UI URL: http://localhost:3000

Adjust using .env:

  • BACKEND_PORT
  • FRONTEND_PORT
  • GEMMA_MODEL_ID

Notes for chat.bhatfamily.in

This repository intentionally does not terminate TLS. Bindings are plain HTTP on host ports and are designed for external reverse proxy + TLS handling (nginx/Caddy/Cloudflare Tunnel).

Current homelab edge mapping (verified 2026-04-19):

  • https://chat.bhatfamily.in is served on 443/tcp by the shared Caddy edge.
  • Direct host ports 3000/tcp (UI) and 8000/tcp (API) are currently publicly reachable; restrict firewall/NAT exposure if direct internet access is not intended.
Description
Gemma 3 + vLLM ROCm + Open WebUI self-host stack
Readme 53 KiB
Languages
Shell 80.8%
Python 17%
Dockerfile 2.2%