Initial production-ready Gemma 3 vLLM ROCm stack
Co-Authored-By: Oz <oz-agent@warp.dev>
This commit is contained in:
126
README.md
Normal file
126
README.md
Normal file
@ -0,0 +1,126 @@
|
||||
# gemma3-vllm-stack
|
||||
Production-ready self-hosted stack for running **Gemma 3** with **vLLM** on AMD ROCm, plus a browser chat UI suitable for publishing at `chat.bhatfamily.in`.
|
||||
|
||||
## What this stack provides
|
||||
- Dockerized **vLLM OpenAI-compatible API** (`/v1`) backed by Gemma 3 on ROCm.
|
||||
- Dockerized **Open WebUI** chat frontend connected to the local vLLM endpoint.
|
||||
- Non-interactive scripts for install, restart, uninstall, and smoke testing.
|
||||
- Documentation for operations, upgrades, and troubleshooting.
|
||||
|
||||
## Repository layout
|
||||
```text
|
||||
gemma3-vllm-stack/
|
||||
├── .env.example
|
||||
├── .gitignore
|
||||
├── docker-compose.yml
|
||||
├── README.md
|
||||
├── backend/
|
||||
│ ├── Dockerfile
|
||||
│ └── config/
|
||||
│ └── model.env.example
|
||||
├── frontend/
|
||||
│ ├── Dockerfile
|
||||
│ └── config/
|
||||
│ └── frontend.env.example
|
||||
├── scripts/
|
||||
│ ├── install.sh
|
||||
│ ├── restart.sh
|
||||
│ ├── test_api.sh
|
||||
│ ├── test_python_client.py
|
||||
│ ├── test_ui.sh
|
||||
│ └── uninstall.sh
|
||||
└── docs/
|
||||
├── ARCHITECTURE.md
|
||||
├── TROUBLESHOOTING.md
|
||||
└── UPGRADE_NOTES.md
|
||||
```
|
||||
|
||||
## Architecture summary
|
||||
- `gemma3-vllm` service runs `vllm/vllm-openai-rocm` and exposes `http://localhost:${BACKEND_PORT}/v1`.
|
||||
- `chat-ui` service runs Open WebUI and exposes `http://localhost:${FRONTEND_PORT}`.
|
||||
- Open WebUI calls `http://gemma3-vllm:8000/v1` on the internal Docker network.
|
||||
|
||||
Detailed architecture: `docs/ARCHITECTURE.md`.
|
||||
|
||||
## Prerequisites
|
||||
- Ubuntu 22.04 LTS (amd64)
|
||||
- AMD ROCm-compatible GPU setup with:
|
||||
- `/dev/kfd`
|
||||
- `/dev/dri`
|
||||
- Docker Engine and docker compose plugin (script auto-installs on Ubuntu if missing)
|
||||
- Hugging Face token with access to Gemma 3 model (set as `HF_TOKEN`)
|
||||
|
||||
## Quickstart
|
||||
1. Clone from your Gitea server:
|
||||
```bash
|
||||
git clone ssh://git@git.bhatfamily.in/rbhat/gemma3-vllm-stack.git
|
||||
cd gemma3-vllm-stack
|
||||
```
|
||||
|
||||
2. Create configuration files:
|
||||
```bash
|
||||
cp .env.example .env
|
||||
cp backend/config/model.env.example backend/config/model.env
|
||||
cp frontend/config/frontend.env.example frontend/config/frontend.env
|
||||
```
|
||||
|
||||
3. Edit `.env` and set at least:
|
||||
- `HF_TOKEN`
|
||||
- `VLLM_API_KEY` (recommended even on LAN)
|
||||
|
||||
4. Install/start stack:
|
||||
```bash
|
||||
./scripts/install.sh
|
||||
```
|
||||
|
||||
5. Run smoke tests:
|
||||
```bash
|
||||
./scripts/test_api.sh
|
||||
./scripts/test_ui.sh
|
||||
python3 scripts/test_python_client.py
|
||||
```
|
||||
|
||||
6. Open browser:
|
||||
- `http://localhost:3000`
|
||||
- Reverse proxy externally to `https://chat.bhatfamily.in`
|
||||
|
||||
## Operations
|
||||
- Restart stack:
|
||||
```bash
|
||||
./scripts/restart.sh
|
||||
```
|
||||
|
||||
- View logs:
|
||||
```bash
|
||||
docker compose logs --tail=200 gemma3-vllm chat-ui
|
||||
```
|
||||
|
||||
- Stop and remove stack resources:
|
||||
```bash
|
||||
./scripts/uninstall.sh
|
||||
```
|
||||
|
||||
- Stop/remove stack and purge local cache/model/UI data:
|
||||
```bash
|
||||
./scripts/uninstall.sh --purge
|
||||
```
|
||||
|
||||
## Upgrade workflow
|
||||
```bash
|
||||
git pull
|
||||
docker compose pull
|
||||
./scripts/restart.sh
|
||||
```
|
||||
More details: `docs/UPGRADE_NOTES.md`.
|
||||
|
||||
## Default endpoints
|
||||
- API base URL: `http://localhost:8000/v1`
|
||||
- UI URL: `http://localhost:3000`
|
||||
|
||||
Adjust using `.env`:
|
||||
- `BACKEND_PORT`
|
||||
- `FRONTEND_PORT`
|
||||
- `GEMMA_MODEL_ID`
|
||||
|
||||
## Notes for `chat.bhatfamily.in`
|
||||
This repository intentionally does not terminate TLS. Bindings are plain HTTP on host ports and are designed for external reverse proxy + TLS handling (nginx/Caddy/Cloudflare Tunnel).
|
||||
Reference in New Issue
Block a user