Initial production-ready Gemma 3 vLLM ROCm stack

Co-Authored-By: Oz <oz-agent@warp.dev>
This commit is contained in:
Raghav
2026-04-18 22:53:38 +05:30
commit ef8537e923
18 changed files with 988 additions and 0 deletions

9
.env.example Normal file
View File

@ -0,0 +1,9 @@
HF_TOKEN=YOUR_HF_TOKEN_HERE
VLLM_API_KEY=YOUR_LOCAL_API_KEY_HERE
GEMMA_MODEL_ID=google/gemma-3-1b-it
BACKEND_PORT=8000
FRONTEND_PORT=3000
HUGGINGFACE_CACHE_DIR=/home/${USER}/.cache/huggingface
OPEN_WEBUI_DATA_DIR=./frontend/data/open-webui
VLLM_MAX_MODEL_LEN=4096
VLLM_GPU_MEMORY_UTILIZATION=0.88

25
.gitignore vendored Normal file
View File

@ -0,0 +1,25 @@
# Environment and secrets
.env
backend/config/model.env
frontend/config/frontend.env
# Python
__pycache__/
*.pyc
*.pyo
*.pyd
.venv/
venv/
# Editor / OS
.DS_Store
.idea/
.vscode/
# Logs
*.log
# Runtime data
backend/data/
frontend/data/
models/

126
README.md Normal file
View File

@ -0,0 +1,126 @@
# gemma3-vllm-stack
Production-ready self-hosted stack for running **Gemma 3** with **vLLM** on AMD ROCm, plus a browser chat UI suitable for publishing at `chat.bhatfamily.in`.
## What this stack provides
- Dockerized **vLLM OpenAI-compatible API** (`/v1`) backed by Gemma 3 on ROCm.
- Dockerized **Open WebUI** chat frontend connected to the local vLLM endpoint.
- Non-interactive scripts for install, restart, uninstall, and smoke testing.
- Documentation for operations, upgrades, and troubleshooting.
## Repository layout
```text
gemma3-vllm-stack/
├── .env.example
├── .gitignore
├── docker-compose.yml
├── README.md
├── backend/
│ ├── Dockerfile
│ └── config/
│ └── model.env.example
├── frontend/
│ ├── Dockerfile
│ └── config/
│ └── frontend.env.example
├── scripts/
│ ├── install.sh
│ ├── restart.sh
│ ├── test_api.sh
│ ├── test_python_client.py
│ ├── test_ui.sh
│ └── uninstall.sh
└── docs/
├── ARCHITECTURE.md
├── TROUBLESHOOTING.md
└── UPGRADE_NOTES.md
```
## Architecture summary
- `gemma3-vllm` service runs `vllm/vllm-openai-rocm` and exposes `http://localhost:${BACKEND_PORT}/v1`.
- `chat-ui` service runs Open WebUI and exposes `http://localhost:${FRONTEND_PORT}`.
- Open WebUI calls `http://gemma3-vllm:8000/v1` on the internal Docker network.
Detailed architecture: `docs/ARCHITECTURE.md`.
## Prerequisites
- Ubuntu 22.04 LTS (amd64)
- AMD ROCm-compatible GPU setup with:
- `/dev/kfd`
- `/dev/dri`
- Docker Engine and docker compose plugin (script auto-installs on Ubuntu if missing)
- Hugging Face token with access to Gemma 3 model (set as `HF_TOKEN`)
## Quickstart
1. Clone from your Gitea server:
```bash
git clone ssh://git@git.bhatfamily.in/rbhat/gemma3-vllm-stack.git
cd gemma3-vllm-stack
```
2. Create configuration files:
```bash
cp .env.example .env
cp backend/config/model.env.example backend/config/model.env
cp frontend/config/frontend.env.example frontend/config/frontend.env
```
3. Edit `.env` and set at least:
- `HF_TOKEN`
- `VLLM_API_KEY` (recommended even on LAN)
4. Install/start stack:
```bash
./scripts/install.sh
```
5. Run smoke tests:
```bash
./scripts/test_api.sh
./scripts/test_ui.sh
python3 scripts/test_python_client.py
```
6. Open browser:
- `http://localhost:3000`
- Reverse proxy externally to `https://chat.bhatfamily.in`
## Operations
- Restart stack:
```bash
./scripts/restart.sh
```
- View logs:
```bash
docker compose logs --tail=200 gemma3-vllm chat-ui
```
- Stop and remove stack resources:
```bash
./scripts/uninstall.sh
```
- Stop/remove stack and purge local cache/model/UI data:
```bash
./scripts/uninstall.sh --purge
```
## Upgrade workflow
```bash
git pull
docker compose pull
./scripts/restart.sh
```
More details: `docs/UPGRADE_NOTES.md`.
## Default endpoints
- API base URL: `http://localhost:8000/v1`
- UI URL: `http://localhost:3000`
Adjust using `.env`:
- `BACKEND_PORT`
- `FRONTEND_PORT`
- `GEMMA_MODEL_ID`
## Notes for `chat.bhatfamily.in`
This repository intentionally does not terminate TLS. Bindings are plain HTTP on host ports and are designed for external reverse proxy + TLS handling (nginx/Caddy/Cloudflare Tunnel).

4
backend/Dockerfile Normal file
View File

@ -0,0 +1,4 @@
# Optional backend Dockerfile.
# This stack uses the official vLLM ROCm image directly from docker-compose.yml.
# Keep this file for future customizations.
FROM vllm/vllm-openai-rocm:latest

View File

@ -0,0 +1,7 @@
HF_TOKEN=YOUR_HF_TOKEN_HERE
VLLM_API_KEY=YOUR_LOCAL_API_KEY_HERE
GEMMA_MODEL_ID=google/gemma-3-1b-it
BACKEND_PORT=8000
HUGGINGFACE_CACHE_DIR=/home/${USER}/.cache/huggingface
VLLM_MAX_MODEL_LEN=4096
VLLM_GPU_MEMORY_UTILIZATION=0.88

69
docker-compose.yml Normal file
View File

@ -0,0 +1,69 @@
services:
gemma3-vllm:
image: vllm/vllm-openai-rocm:latest
container_name: gemma3-vllm
restart: unless-stopped
env_file:
- ./backend/config/model.env
environment:
HUGGING_FACE_HUB_TOKEN: ${HF_TOKEN}
HF_TOKEN: ${HF_TOKEN}
PYTORCH_ROCM_ARCH: gfx1103
command:
- --model
- ${GEMMA_MODEL_ID:-google/gemma-3-1b-it}
- --host
- 0.0.0.0
- --port
- "8000"
- --dtype
- float16
- --max-model-len
- ${VLLM_MAX_MODEL_LEN:-4096}
- --gpu-memory-utilization
- ${VLLM_GPU_MEMORY_UTILIZATION:-0.88}
- --api-key
- ${VLLM_API_KEY:-local-dev-key}
devices:
- /dev/kfd
- /dev/dri
group_add:
- video
cap_add:
- SYS_PTRACE
security_opt:
- seccomp=unconfined
ports:
- "${BACKEND_PORT:-8000}:8000"
volumes:
- ${HUGGINGFACE_CACHE_DIR:-/home/${USER}/.cache/huggingface}:/root/.cache/huggingface
- ./models:/models
healthcheck:
test: ["CMD-SHELL", "curl -sf http://localhost:8000/health >/dev/null || exit 1"]
interval: 30s
timeout: 10s
retries: 10
start_period: 120s
chat-ui:
image: ghcr.io/open-webui/open-webui:main
container_name: gemma3-chat-ui
restart: unless-stopped
depends_on:
gemma3-vllm:
condition: service_started
env_file:
- ./frontend/config/frontend.env
environment:
WEBUI_AUTH: "False"
OPENAI_API_BASE_URL: ${OPENAI_API_BASE_URL:-http://gemma3-vllm:8000/v1}
OPENAI_API_KEY: ${VLLM_API_KEY:-local-dev-key}
ENABLE_OPENAI_API: "True"
ENABLE_OLLAMA_API: "False"
DEFAULT_MODELS: ${GEMMA_MODEL_ID:-google/gemma-3-1b-it}
GLOBAL_LOG_LEVEL: INFO
WEBUI_NAME: Gemma 3 via vLLM
ports:
- "${FRONTEND_PORT:-3000}:8080"
volumes:
- ${OPEN_WEBUI_DATA_DIR:-./frontend/data/open-webui}:/app/backend/data

72
docs/ARCHITECTURE.md Normal file
View File

@ -0,0 +1,72 @@
# Architecture
## Component flow
```text
[Browser @ chat.bhatfamily.in]
|
| HTTPS (terminated externally)
v
[Host reverse proxy (external to this repo)]
|
| HTTP -> localhost:3000
v
[chat-ui container: Open WebUI]
|
| HTTP (docker internal network)
v
[gemma3-vllm container: vLLM OpenAI API @ :8000/v1]
|
| reads model weights/cache
v
[Hugging Face cache + local models dir]
|
| ROCm runtime
v
[AMD Radeon 780M (RDNA3 iGPU) via /dev/kfd + /dev/dri]
```
## Services
### `gemma3-vllm`
- Image: `vllm/vllm-openai-rocm:latest`
- Purpose: Run Gemma 3 instruction model through OpenAI-compatible API.
- Host port mapping: `${BACKEND_PORT}:8000` (default `8000:8000`)
- Device passthrough:
- `/dev/kfd`
- `/dev/dri`
- Security/capabilities for ROCm debugging compatibility:
- `cap_add: SYS_PTRACE`
- `security_opt: seccomp=unconfined`
- `group_add: video`
### `chat-ui`
- Image: `ghcr.io/open-webui/open-webui:main`
- Purpose: Browser chat experience with local persistence in mounted data directory.
- Host port mapping: `${FRONTEND_PORT}:8080` (default `3000:8080`)
- Upstream model endpoint on docker network:
- `OPENAI_API_BASE_URL=http://gemma3-vllm:8000/v1`
## Networking
- Docker Compose default bridge network is used.
- `chat-ui` resolves `gemma3-vllm` by service name.
- External access is via host ports:
- API: `localhost:8000`
- UI: `localhost:3000`
## Storage
- Hugging Face cache bind mount:
- Host: `${HUGGINGFACE_CACHE_DIR}`
- Container: `/root/.cache/huggingface`
- Optional local models directory:
- Host: `./models`
- Container: `/models`
- Open WebUI data:
- Host: `${OPEN_WEBUI_DATA_DIR}`
- Container: `/app/backend/data`
## Scaling notes
This repository is designed for **single-node deployment** on one AMD APU/GPU host.
For larger deployments later:
- Move to dedicated GPUs with larger VRAM.
- Use pinned vLLM image tags and explicit engine tuning.
- Consider externalized model storage and distributed orchestration (Kubernetes/Swarm/Nomad).
- Add request routing, autoscaling, and centralized observability.

14
docs/README.md Normal file
View File

@ -0,0 +1,14 @@
# Documentation Index
This folder contains operational and lifecycle documentation for the `gemma3-vllm-stack` repository.
## Files
- `ARCHITECTURE.md`: Component topology, networking, runtime dependencies, and scaling notes.
- `TROUBLESHOOTING.md`: Common failures and copy-paste diagnostics/fixes for ROCm, Docker, vLLM, and UI issues.
- `UPGRADE_NOTES.md`: Safe upgrade, rollback, and backup guidance.
## Recommended reading order
1. `ARCHITECTURE.md`
2. `TROUBLESHOOTING.md`
3. `UPGRADE_NOTES.md`
For quick start and day-1 usage, use the repository root `README.md`.

172
docs/TROUBLESHOOTING.md Normal file
View File

@ -0,0 +1,172 @@
# Troubleshooting
## ROCm devices not visible in host
Symptoms:
- `/dev/kfd` missing
- `/dev/dri` missing
- vLLM fails to start with ROCm device errors
Checks:
```bash
ls -l /dev/kfd /dev/dri
id
getent group video
```
Expected:
- `/dev/kfd` exists
- `/dev/dri` directory exists
- user belongs to `video` group
Fixes:
```bash
sudo usermod -aG video "$USER"
newgrp video
```
Then verify ROCm tools:
```bash
rocminfo | sed -n '1,120p'
```
If ROCm is not healthy, fix host ROCm installation first.
---
## Docker and Compose not available
Symptoms:
- `docker: command not found`
- `docker compose version` fails
Checks:
```bash
docker --version
docker compose version
```
Fix using install script (Ubuntu):
```bash
./scripts/install.sh
```
Manual fallback:
```bash
sudo apt-get update
sudo apt-get install -y ca-certificates curl gnupg
sudo install -m 0755 -d /etc/apt/keyrings
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg
sudo chmod a+r /etc/apt/keyrings/docker.gpg
echo "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.gpg] https://download.docker.com/linux/ubuntu jammy stable" | sudo tee /etc/apt/sources.list.d/docker.list >/dev/null
sudo apt-get update
sudo apt-get install -y docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin
sudo usermod -aG docker "$USER"
```
Log out/in after group change.
---
## vLLM container exits or fails healthchecks
Symptoms:
- `gemma3-vllm` restarting
- API endpoint unavailable
Checks:
```bash
docker compose ps
docker compose logs --tail=200 gemma3-vllm
```
Common causes and fixes:
1. Missing/invalid Hugging Face token:
```bash
grep -E '^(HF_TOKEN|GEMMA_MODEL_ID)=' .env
```
Ensure `HF_TOKEN` is set to a valid token with access to Gemma 3.
2. Model ID typo:
```bash
grep '^GEMMA_MODEL_ID=' .env
```
Use a valid model, e.g. `google/gemma-3-1b-it`.
3. ROCm runtime/device issues:
```bash
docker run --rm --device=/dev/kfd --device=/dev/dri --group-add video ubuntu:22.04 bash -lc 'ls -l /dev/kfd /dev/dri'
```
4. API key mismatch between backend and UI/tests:
```bash
grep -E '^(VLLM_API_KEY|OPENAI_API_BASE_URL)=' .env frontend/config/frontend.env 2>/dev/null || true
```
Keep keys consistent.
---
## Out-of-memory (OOM) or low VRAM errors
Symptoms:
- startup failure referencing memory allocation
- runtime generation failures
Checks:
```bash
docker compose logs --tail=300 gemma3-vllm | grep -Ei 'out of memory|oom|memory|cuda|hip|rocm'
```
Mitigations:
1. Reduce context length in `.env`:
```bash
VLLM_MAX_MODEL_LEN=2048
```
2. Lower GPU memory utilization target:
```bash
VLLM_GPU_MEMORY_UTILIZATION=0.75
```
3. Use a smaller Gemma 3 variant in `.env`.
4. Restart stack:
```bash
./scripts/restart.sh
```
---
## UI loads but cannot reach vLLM backend
Symptoms:
- Browser opens UI but chat requests fail.
Checks:
```bash
docker compose ps
docker compose logs --tail=200 chat-ui
docker compose logs --tail=200 gemma3-vllm
```
Verify frontend backend URL:
```bash
grep -E '^OPENAI_API_BASE_URL=' frontend/config/frontend.env
```
Expected value:
```text
OPENAI_API_BASE_URL=http://gemma3-vllm:8000/v1
```
Verify API directly from host:
```bash
./scripts/test_api.sh
```
If API works from host but not UI, recreate frontend:
```bash
docker compose up -d --force-recreate chat-ui
```
---
## Health checks and endpoint validation
Run all smoke tests:
```bash
./scripts/test_api.sh
./scripts/test_ui.sh
python3 scripts/test_python_client.py
```
If one fails, inspect corresponding service logs and then restart:
```bash
docker compose logs --tail=200 gemma3-vllm chat-ui
./scripts/restart.sh
```

50
docs/UPGRADE_NOTES.md Normal file
View File

@ -0,0 +1,50 @@
# Upgrade Notes
## Standard safe upgrade path
From repository root:
```bash
git pull
docker compose pull
./scripts/restart.sh
```
Then run smoke tests:
```bash
./scripts/test_api.sh
./scripts/test_ui.sh
python3 scripts/test_python_client.py
```
## Versioning guidance
- Prefer pinning image tags in `docker-compose.yml` once your deployment is stable.
- Upgrading vLLM may change runtime defaults or engine behavior; check vLLM release notes before major version jumps.
- Keep `GEMMA_MODEL_ID` explicit in `.env` to avoid unintentional model drift.
## Model upgrade considerations
When changing Gemma 3 variants (for example, from 1B to larger sizes):
- Verify host RAM and GPU memory capacity.
- Expect re-download of model weights and larger disk usage.
- Re-tune:
- `VLLM_MAX_MODEL_LEN`
- `VLLM_GPU_MEMORY_UTILIZATION`
- Re-run validation scripts after restart.
## Backup recommendations
Before major upgrades, back up local persistent data:
```bash
mkdir -p backups
tar -czf backups/hf-cache-$(date +%Y%m%d-%H%M%S).tar.gz "${HOME}/.cache/huggingface"
tar -czf backups/open-webui-data-$(date +%Y%m%d-%H%M%S).tar.gz frontend/data/open-webui
```
If you use local predownloaded models:
```bash
tar -czf backups/models-$(date +%Y%m%d-%H%M%S).tar.gz models
```
## Rollback approach
If a new image/model combination fails:
1. Revert `docker-compose.yml` and `.env` to previous known-good values.
2. Pull previous pinned images (if pinned by tag/digest).
3. Restart:
```bash
./scripts/restart.sh
```
4. Re-run smoke tests.

3
frontend/Dockerfile Normal file
View File

@ -0,0 +1,3 @@
# Optional frontend Dockerfile.
# This stack uses the official Open WebUI image directly from docker-compose.yml.
FROM ghcr.io/open-webui/open-webui:main

View File

@ -0,0 +1,5 @@
FRONTEND_PORT=3000
OPENAI_API_BASE_URL=http://gemma3-vllm:8000/v1
VLLM_API_KEY=YOUR_LOCAL_API_KEY_HERE
GEMMA_MODEL_ID=google/gemma-3-1b-it
OPEN_WEBUI_DATA_DIR=./frontend/data/open-webui

155
scripts/install.sh Executable file
View File

@ -0,0 +1,155 @@
#!/usr/bin/env bash
# Installs prerequisites (if needed), prepares config files, and starts Gemma 3 + vLLM + chat UI stack.
set -euo pipefail
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
REPO_ROOT="$(cd "${SCRIPT_DIR}/.." && pwd)"
log() {
printf '[install] %s
' "$*"
}
err() {
printf '[install][error] %s
' "$*" >&2
}
require_linux() {
if [[ "$(uname -s)" != "Linux" ]]; then
err "This script supports Linux only."
exit 1
fi
}
install_docker_ubuntu() {
log "Installing Docker Engine and Compose plugin using official Docker apt repository."
sudo apt-get update
sudo apt-get install -y ca-certificates curl gnupg
sudo install -m 0755 -d /etc/apt/keyrings
if [[ ! -f /etc/apt/keyrings/docker.gpg ]]; then
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg
sudo chmod a+r /etc/apt/keyrings/docker.gpg
fi
source /etc/os-release
local arch
arch="$(dpkg --print-architecture)"
local codename
codename="${VERSION_CODENAME:-jammy}"
echo "deb [arch=${arch} signed-by=/etc/apt/keyrings/docker.gpg] https://download.docker.com/linux/ubuntu ${codename} stable" | sudo tee /etc/apt/sources.list.d/docker.list >/dev/null
sudo apt-get update
sudo apt-get install -y docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin
if ! sudo systemctl is-active --quiet docker; then
sudo systemctl enable --now docker
fi
if ! getent group docker >/dev/null; then
sudo groupadd docker
fi
if ! id -nG "${USER}" | grep -qw docker; then
sudo usermod -aG docker "${USER}"
log "Added ${USER} to docker group. You may need to log out and back in for group membership to apply."
fi
}
check_or_install_docker() {
local have_docker=1
local have_compose=1
if ! command -v docker >/dev/null 2>&1; then
have_docker=0
fi
if ! docker compose version >/dev/null 2>&1; then
have_compose=0
fi
if [[ ${have_docker} -eq 1 && ${have_compose} -eq 1 ]]; then
log "Docker and Compose plugin are already available."
return
fi
if [[ -f /etc/os-release ]]; then
source /etc/os-release
if [[ "${ID:-}" == "ubuntu" ]]; then
install_docker_ubuntu
return
fi
fi
err "Docker/Compose missing and automatic installation is implemented for Ubuntu only."
err "See docs/TROUBLESHOOTING.md#docker-and-compose-not-available"
exit 1
}
prepare_env_files() {
if [[ ! -f "${REPO_ROOT}/.env" ]]; then
cp "${REPO_ROOT}/.env.example" "${REPO_ROOT}/.env"
log "Created .env from .env.example."
err "IMPORTANT: edit .env and set HF_TOKEN (and optionally VLLM_API_KEY) before production use."
fi
if [[ ! -f "${REPO_ROOT}/backend/config/model.env" ]]; then
cp "${REPO_ROOT}/backend/config/model.env.example" "${REPO_ROOT}/backend/config/model.env"
log "Created backend/config/model.env from example."
fi
if [[ ! -f "${REPO_ROOT}/frontend/config/frontend.env" ]]; then
cp "${REPO_ROOT}/frontend/config/frontend.env.example" "${REPO_ROOT}/frontend/config/frontend.env"
log "Created frontend/config/frontend.env from example."
fi
mkdir -p "${REPO_ROOT}/models" "${REPO_ROOT}/frontend/data/open-webui"
}
warn_if_rocm_devices_missing() {
if [[ ! -e /dev/kfd || ! -d /dev/dri ]]; then
err "ROCm device files /dev/kfd or /dev/dri are not available."
err "See docs/TROUBLESHOOTING.md#rocm-devices-not-visible-in-host"
fi
}
start_stack() {
log "Pulling container images."
docker compose -f "${REPO_ROOT}/docker-compose.yml" --env-file "${REPO_ROOT}/.env" pull
log "Starting containers in detached mode."
docker compose -f "${REPO_ROOT}/docker-compose.yml" --env-file "${REPO_ROOT}/.env" up -d
}
show_status_and_urls() {
local backend_port frontend_port
backend_port="$(grep -E '^BACKEND_PORT=' "${REPO_ROOT}/.env" | tail -n1 | cut -d'=' -f2 || true)"
frontend_port="$(grep -E '^FRONTEND_PORT=' "${REPO_ROOT}/.env" | tail -n1 | cut -d'=' -f2 || true)"
backend_port="${backend_port:-8000}"
frontend_port="${frontend_port:-3000}"
log "Backend status:"
docker compose -f "${REPO_ROOT}/docker-compose.yml" --env-file "${REPO_ROOT}/.env" ps gemma3-vllm || true
log "Frontend status:"
docker compose -f "${REPO_ROOT}/docker-compose.yml" --env-file "${REPO_ROOT}/.env" ps chat-ui || true
printf '
'
log "API endpoint: http://localhost:${backend_port}/v1"
log "Chat UI endpoint: http://localhost:${frontend_port}"
log "If startup fails, inspect logs with: docker compose logs --tail=200 gemma3-vllm chat-ui"
}
main() {
require_linux
check_or_install_docker
prepare_env_files
warn_if_rocm_devices_missing
start_stack
show_status_and_urls
}
main "$@"

25
scripts/restart.sh Executable file
View File

@ -0,0 +1,25 @@
#!/usr/bin/env bash
# Restarts the Gemma 3 vLLM stack and shows service status.
set -euo pipefail
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
REPO_ROOT="$(cd "${SCRIPT_DIR}/.." && pwd)"
ENV_FILE="${REPO_ROOT}/.env"
log() {
printf '[restart] %s
' "$*"
}
if [[ ! -f "${ENV_FILE}" ]]; then
ENV_FILE="${REPO_ROOT}/.env.example"
fi
log "Stopping stack."
docker compose -f "${REPO_ROOT}/docker-compose.yml" --env-file "${ENV_FILE}" down
log "Starting stack."
docker compose -f "${REPO_ROOT}/docker-compose.yml" --env-file "${ENV_FILE}" up -d
log "Current status:"
docker compose -f "${REPO_ROOT}/docker-compose.yml" --env-file "${ENV_FILE}" ps

54
scripts/test_api.sh Executable file
View File

@ -0,0 +1,54 @@
#!/usr/bin/env bash
# Tests local vLLM OpenAI-compatible API using curl and validates response shape.
set -euo pipefail
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
REPO_ROOT="$(cd "${SCRIPT_DIR}/.." && pwd)"
ENV_FILE="${REPO_ROOT}/.env"
if [[ ! -f "${ENV_FILE}" ]]; then
echo "[test_api][error] .env file not found. Copy .env.example to .env first." >&2
exit 1
fi
# shellcheck disable=SC1090
source "${ENV_FILE}"
BACKEND_PORT="${BACKEND_PORT:-8000}"
GEMMA_MODEL_ID="${GEMMA_MODEL_ID:-google/gemma-3-1b-it}"
VLLM_API_KEY="${VLLM_API_KEY:-EMPTY}"
API_URL="http://localhost:${BACKEND_PORT}/v1/chat/completions"
payload_file="$(mktemp)"
response_file="$(mktemp)"
trap 'rm -f "${payload_file}" "${response_file}"' EXIT
cat > "${payload_file}" <<JSON
{
"model": "${GEMMA_MODEL_ID}",
"messages": [
{"role": "system", "content": "You are a concise assistant."},
{"role": "user", "content": "Say hello from Gemma 3 running on vLLM."}
],
"temperature": 0.2,
"max_tokens": 64
}
JSON
http_status="$(curl -sS -o "${response_file}" -w '%{http_code}' -H "Content-Type: application/json" -H "Authorization: Bearer ${VLLM_API_KEY}" -X POST "${API_URL}" --data @"${payload_file}")"
if [[ ! "${http_status}" =~ ^2 ]]; then
echo "[test_api][error] API returned HTTP ${http_status}" >&2
cat "${response_file}" >&2
echo "[test_api][hint] See docs/TROUBLESHOOTING.md#vllm-container-exits-or-fails-healthchecks" >&2
exit 1
fi
if ! grep -q '"choices"' "${response_file}"; then
echo "[test_api][error] API response did not include expected 'choices' field." >&2
cat "${response_file}" >&2
exit 1
fi
echo "[test_api] Success. API responded with expected structure."
cat "${response_file}"

75
scripts/test_python_client.py Executable file
View File

@ -0,0 +1,75 @@
#!/usr/bin/env python3
"""Tests local vLLM OpenAI-compatible API using openai>=1.x Python client."""
from __future__ import annotations
import os
import sys
from pathlib import Path
def load_dotenv(dotenv_path: Path) -> None:
if not dotenv_path.exists():
return
for raw_line in dotenv_path.read_text(encoding="utf-8").splitlines():
line = raw_line.strip()
if not line or line.startswith("#") or "=" not in line:
continue
key, value = line.split("=", 1)
key = key.strip()
value = value.strip().strip('"').strip("'")
os.environ.setdefault(key, value)
def main() -> int:
repo_root = Path(__file__).resolve().parent.parent
load_dotenv(repo_root / ".env")
backend_port = os.getenv("BACKEND_PORT", "8000")
model_id = os.getenv("GEMMA_MODEL_ID", "google/gemma-3-1b-it")
api_key = os.getenv("VLLM_API_KEY", "EMPTY")
base_url = f"http://localhost:{backend_port}/v1"
try:
from openai import OpenAI
except ImportError:
print("[test_python_client][error] openai package is not installed.", file=sys.stderr)
print("Install it with: python3 -m pip install openai", file=sys.stderr)
return 1
client = OpenAI(api_key=api_key, base_url=base_url)
try:
response = client.chat.completions.create(
model=model_id,
messages=[
{"role": "system", "content": "You are a concise assistant."},
{
"role": "user",
"content": "Say hello from Gemma 3 running on vLLM in one sentence.",
},
],
temperature=0.2,
max_tokens=64,
)
except Exception as exc:
print(f"[test_python_client][error] Request failed: {exc}", file=sys.stderr)
print(
"[test_python_client][hint] See docs/TROUBLESHOOTING.md#vllm-container-exits-or-fails-healthchecks",
file=sys.stderr,
)
return 1
if not response.choices or not response.choices[0].message:
print("[test_python_client][error] No completion choices returned.", file=sys.stderr)
return 1
content = response.choices[0].message.content or ""
print("[test_python_client] Success. Assistant response:")
print(content.strip())
return 0
if __name__ == "__main__":
raise SystemExit(main())

25
scripts/test_ui.sh Executable file
View File

@ -0,0 +1,25 @@
#!/usr/bin/env bash
# Tests whether the chat UI is reachable on localhost frontend port.
set -euo pipefail
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
REPO_ROOT="$(cd "${SCRIPT_DIR}/.." && pwd)"
ENV_FILE="${REPO_ROOT}/.env"
if [[ -f "${ENV_FILE}" ]]; then
# shellcheck disable=SC1090
source "${ENV_FILE}"
fi
FRONTEND_PORT="${FRONTEND_PORT:-3000}"
UI_URL="http://localhost:${FRONTEND_PORT}"
http_status="$(curl -sS -o /dev/null -w '%{http_code}' "${UI_URL}")"
if [[ "${http_status}" != "200" && "${http_status}" != "301" && "${http_status}" != "302" ]]; then
echo "[test_ui][error] UI check failed with HTTP status ${http_status} at ${UI_URL}" >&2
echo "[test_ui][hint] See docs/TROUBLESHOOTING.md#ui-loads-but-cannot-reach-vllm-backend" >&2
exit 1
fi
echo "[test_ui] Chat UI is reachable at ${UI_URL} (HTTP ${http_status})."

98
scripts/uninstall.sh Executable file
View File

@ -0,0 +1,98 @@
#!/usr/bin/env bash
# Stops and removes the Gemma 3 vLLM stack. Optional --purge removes local model/cache data.
set -euo pipefail
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
REPO_ROOT="$(cd "${SCRIPT_DIR}/.." && pwd)"
PURGE=0
log() {
printf '[uninstall] %s
' "$*"
}
err() {
printf '[uninstall][error] %s
' "$*" >&2
}
usage() {
cat <<'EOF'
Usage: scripts/uninstall.sh [--purge]
Options:
--purge Remove local Hugging Face cache directory and ./models data in addition to containers/volumes.
-h, --help Show this help message.
EOF
}
while [[ $# -gt 0 ]]; do
case "$1" in
--purge)
PURGE=1
;;
-h|--help)
usage
exit 0
;;
*)
err "Unknown argument: $1"
usage
exit 1
;;
esac
shift
done
if [[ ! -f "${REPO_ROOT}/docker-compose.yml" ]]; then
err "docker-compose.yml not found at ${REPO_ROOT}."
exit 1
fi
ENV_FILE="${REPO_ROOT}/.env"
if [[ ! -f "${ENV_FILE}" ]]; then
ENV_FILE="${REPO_ROOT}/.env.example"
fi
log "Stopping stack and removing containers, networks, and named/anonymous volumes."
docker compose -f "${REPO_ROOT}/docker-compose.yml" --env-file "${ENV_FILE}" down -v || true
if [[ ${PURGE} -eq 1 ]]; then
log "Purge requested. Removing local data directories used by this stack."
huggingface_cache_dir="$(grep -E '^HUGGINGFACE_CACHE_DIR=' "${ENV_FILE}" | tail -n1 | cut -d'=' -f2- || true)"
open_webui_data_dir="$(grep -E '^OPEN_WEBUI_DATA_DIR=' "${ENV_FILE}" | tail -n1 | cut -d'=' -f2- || true)"
if [[ -n "${huggingface_cache_dir}" ]]; then
# Expand potential variables such as ${USER}
evaluated_hf_dir="$(eval "printf '%s' "${huggingface_cache_dir}"")"
if [[ -d "${evaluated_hf_dir}" ]]; then
log "Removing Hugging Face cache directory: ${evaluated_hf_dir}"
rm -rf "${evaluated_hf_dir}"
else
log "Hugging Face cache directory not found: ${evaluated_hf_dir}"
fi
fi
if [[ -z "${open_webui_data_dir}" ]]; then
open_webui_data_dir="./frontend/data/open-webui"
fi
if [[ "${open_webui_data_dir}" == ./* ]]; then
open_webui_data_dir="${REPO_ROOT}/${open_webui_data_dir#./}"
fi
if [[ -d "${open_webui_data_dir}" ]]; then
log "Removing Open WebUI data directory: ${open_webui_data_dir}"
rm -rf "${open_webui_data_dir}"
fi
if [[ -d "${REPO_ROOT}/models" ]]; then
log "Removing local models directory: ${REPO_ROOT}/models"
rm -rf "${REPO_ROOT}/models"
fi
else
log "Safe mode enabled (default). Local model/cache data was preserved."
fi
log "Uninstall complete."