Initial production-ready Gemma 3 vLLM ROCm stack
Co-Authored-By: Oz <oz-agent@warp.dev>
This commit is contained in:
9
.env.example
Normal file
9
.env.example
Normal file
@ -0,0 +1,9 @@
|
|||||||
|
HF_TOKEN=YOUR_HF_TOKEN_HERE
|
||||||
|
VLLM_API_KEY=YOUR_LOCAL_API_KEY_HERE
|
||||||
|
GEMMA_MODEL_ID=google/gemma-3-1b-it
|
||||||
|
BACKEND_PORT=8000
|
||||||
|
FRONTEND_PORT=3000
|
||||||
|
HUGGINGFACE_CACHE_DIR=/home/${USER}/.cache/huggingface
|
||||||
|
OPEN_WEBUI_DATA_DIR=./frontend/data/open-webui
|
||||||
|
VLLM_MAX_MODEL_LEN=4096
|
||||||
|
VLLM_GPU_MEMORY_UTILIZATION=0.88
|
||||||
25
.gitignore
vendored
Normal file
25
.gitignore
vendored
Normal file
@ -0,0 +1,25 @@
|
|||||||
|
# Environment and secrets
|
||||||
|
.env
|
||||||
|
backend/config/model.env
|
||||||
|
frontend/config/frontend.env
|
||||||
|
|
||||||
|
# Python
|
||||||
|
__pycache__/
|
||||||
|
*.pyc
|
||||||
|
*.pyo
|
||||||
|
*.pyd
|
||||||
|
.venv/
|
||||||
|
venv/
|
||||||
|
|
||||||
|
# Editor / OS
|
||||||
|
.DS_Store
|
||||||
|
.idea/
|
||||||
|
.vscode/
|
||||||
|
|
||||||
|
# Logs
|
||||||
|
*.log
|
||||||
|
|
||||||
|
# Runtime data
|
||||||
|
backend/data/
|
||||||
|
frontend/data/
|
||||||
|
models/
|
||||||
126
README.md
Normal file
126
README.md
Normal file
@ -0,0 +1,126 @@
|
|||||||
|
# gemma3-vllm-stack
|
||||||
|
Production-ready self-hosted stack for running **Gemma 3** with **vLLM** on AMD ROCm, plus a browser chat UI suitable for publishing at `chat.bhatfamily.in`.
|
||||||
|
|
||||||
|
## What this stack provides
|
||||||
|
- Dockerized **vLLM OpenAI-compatible API** (`/v1`) backed by Gemma 3 on ROCm.
|
||||||
|
- Dockerized **Open WebUI** chat frontend connected to the local vLLM endpoint.
|
||||||
|
- Non-interactive scripts for install, restart, uninstall, and smoke testing.
|
||||||
|
- Documentation for operations, upgrades, and troubleshooting.
|
||||||
|
|
||||||
|
## Repository layout
|
||||||
|
```text
|
||||||
|
gemma3-vllm-stack/
|
||||||
|
├── .env.example
|
||||||
|
├── .gitignore
|
||||||
|
├── docker-compose.yml
|
||||||
|
├── README.md
|
||||||
|
├── backend/
|
||||||
|
│ ├── Dockerfile
|
||||||
|
│ └── config/
|
||||||
|
│ └── model.env.example
|
||||||
|
├── frontend/
|
||||||
|
│ ├── Dockerfile
|
||||||
|
│ └── config/
|
||||||
|
│ └── frontend.env.example
|
||||||
|
├── scripts/
|
||||||
|
│ ├── install.sh
|
||||||
|
│ ├── restart.sh
|
||||||
|
│ ├── test_api.sh
|
||||||
|
│ ├── test_python_client.py
|
||||||
|
│ ├── test_ui.sh
|
||||||
|
│ └── uninstall.sh
|
||||||
|
└── docs/
|
||||||
|
├── ARCHITECTURE.md
|
||||||
|
├── TROUBLESHOOTING.md
|
||||||
|
└── UPGRADE_NOTES.md
|
||||||
|
```
|
||||||
|
|
||||||
|
## Architecture summary
|
||||||
|
- `gemma3-vllm` service runs `vllm/vllm-openai-rocm` and exposes `http://localhost:${BACKEND_PORT}/v1`.
|
||||||
|
- `chat-ui` service runs Open WebUI and exposes `http://localhost:${FRONTEND_PORT}`.
|
||||||
|
- Open WebUI calls `http://gemma3-vllm:8000/v1` on the internal Docker network.
|
||||||
|
|
||||||
|
Detailed architecture: `docs/ARCHITECTURE.md`.
|
||||||
|
|
||||||
|
## Prerequisites
|
||||||
|
- Ubuntu 22.04 LTS (amd64)
|
||||||
|
- AMD ROCm-compatible GPU setup with:
|
||||||
|
- `/dev/kfd`
|
||||||
|
- `/dev/dri`
|
||||||
|
- Docker Engine and docker compose plugin (script auto-installs on Ubuntu if missing)
|
||||||
|
- Hugging Face token with access to Gemma 3 model (set as `HF_TOKEN`)
|
||||||
|
|
||||||
|
## Quickstart
|
||||||
|
1. Clone from your Gitea server:
|
||||||
|
```bash
|
||||||
|
git clone ssh://git@git.bhatfamily.in/rbhat/gemma3-vllm-stack.git
|
||||||
|
cd gemma3-vllm-stack
|
||||||
|
```
|
||||||
|
|
||||||
|
2. Create configuration files:
|
||||||
|
```bash
|
||||||
|
cp .env.example .env
|
||||||
|
cp backend/config/model.env.example backend/config/model.env
|
||||||
|
cp frontend/config/frontend.env.example frontend/config/frontend.env
|
||||||
|
```
|
||||||
|
|
||||||
|
3. Edit `.env` and set at least:
|
||||||
|
- `HF_TOKEN`
|
||||||
|
- `VLLM_API_KEY` (recommended even on LAN)
|
||||||
|
|
||||||
|
4. Install/start stack:
|
||||||
|
```bash
|
||||||
|
./scripts/install.sh
|
||||||
|
```
|
||||||
|
|
||||||
|
5. Run smoke tests:
|
||||||
|
```bash
|
||||||
|
./scripts/test_api.sh
|
||||||
|
./scripts/test_ui.sh
|
||||||
|
python3 scripts/test_python_client.py
|
||||||
|
```
|
||||||
|
|
||||||
|
6. Open browser:
|
||||||
|
- `http://localhost:3000`
|
||||||
|
- Reverse proxy externally to `https://chat.bhatfamily.in`
|
||||||
|
|
||||||
|
## Operations
|
||||||
|
- Restart stack:
|
||||||
|
```bash
|
||||||
|
./scripts/restart.sh
|
||||||
|
```
|
||||||
|
|
||||||
|
- View logs:
|
||||||
|
```bash
|
||||||
|
docker compose logs --tail=200 gemma3-vllm chat-ui
|
||||||
|
```
|
||||||
|
|
||||||
|
- Stop and remove stack resources:
|
||||||
|
```bash
|
||||||
|
./scripts/uninstall.sh
|
||||||
|
```
|
||||||
|
|
||||||
|
- Stop/remove stack and purge local cache/model/UI data:
|
||||||
|
```bash
|
||||||
|
./scripts/uninstall.sh --purge
|
||||||
|
```
|
||||||
|
|
||||||
|
## Upgrade workflow
|
||||||
|
```bash
|
||||||
|
git pull
|
||||||
|
docker compose pull
|
||||||
|
./scripts/restart.sh
|
||||||
|
```
|
||||||
|
More details: `docs/UPGRADE_NOTES.md`.
|
||||||
|
|
||||||
|
## Default endpoints
|
||||||
|
- API base URL: `http://localhost:8000/v1`
|
||||||
|
- UI URL: `http://localhost:3000`
|
||||||
|
|
||||||
|
Adjust using `.env`:
|
||||||
|
- `BACKEND_PORT`
|
||||||
|
- `FRONTEND_PORT`
|
||||||
|
- `GEMMA_MODEL_ID`
|
||||||
|
|
||||||
|
## Notes for `chat.bhatfamily.in`
|
||||||
|
This repository intentionally does not terminate TLS. Bindings are plain HTTP on host ports and are designed for external reverse proxy + TLS handling (nginx/Caddy/Cloudflare Tunnel).
|
||||||
4
backend/Dockerfile
Normal file
4
backend/Dockerfile
Normal file
@ -0,0 +1,4 @@
|
|||||||
|
# Optional backend Dockerfile.
|
||||||
|
# This stack uses the official vLLM ROCm image directly from docker-compose.yml.
|
||||||
|
# Keep this file for future customizations.
|
||||||
|
FROM vllm/vllm-openai-rocm:latest
|
||||||
7
backend/config/model.env.example
Normal file
7
backend/config/model.env.example
Normal file
@ -0,0 +1,7 @@
|
|||||||
|
HF_TOKEN=YOUR_HF_TOKEN_HERE
|
||||||
|
VLLM_API_KEY=YOUR_LOCAL_API_KEY_HERE
|
||||||
|
GEMMA_MODEL_ID=google/gemma-3-1b-it
|
||||||
|
BACKEND_PORT=8000
|
||||||
|
HUGGINGFACE_CACHE_DIR=/home/${USER}/.cache/huggingface
|
||||||
|
VLLM_MAX_MODEL_LEN=4096
|
||||||
|
VLLM_GPU_MEMORY_UTILIZATION=0.88
|
||||||
69
docker-compose.yml
Normal file
69
docker-compose.yml
Normal file
@ -0,0 +1,69 @@
|
|||||||
|
services:
|
||||||
|
gemma3-vllm:
|
||||||
|
image: vllm/vllm-openai-rocm:latest
|
||||||
|
container_name: gemma3-vllm
|
||||||
|
restart: unless-stopped
|
||||||
|
env_file:
|
||||||
|
- ./backend/config/model.env
|
||||||
|
environment:
|
||||||
|
HUGGING_FACE_HUB_TOKEN: ${HF_TOKEN}
|
||||||
|
HF_TOKEN: ${HF_TOKEN}
|
||||||
|
PYTORCH_ROCM_ARCH: gfx1103
|
||||||
|
command:
|
||||||
|
- --model
|
||||||
|
- ${GEMMA_MODEL_ID:-google/gemma-3-1b-it}
|
||||||
|
- --host
|
||||||
|
- 0.0.0.0
|
||||||
|
- --port
|
||||||
|
- "8000"
|
||||||
|
- --dtype
|
||||||
|
- float16
|
||||||
|
- --max-model-len
|
||||||
|
- ${VLLM_MAX_MODEL_LEN:-4096}
|
||||||
|
- --gpu-memory-utilization
|
||||||
|
- ${VLLM_GPU_MEMORY_UTILIZATION:-0.88}
|
||||||
|
- --api-key
|
||||||
|
- ${VLLM_API_KEY:-local-dev-key}
|
||||||
|
devices:
|
||||||
|
- /dev/kfd
|
||||||
|
- /dev/dri
|
||||||
|
group_add:
|
||||||
|
- video
|
||||||
|
cap_add:
|
||||||
|
- SYS_PTRACE
|
||||||
|
security_opt:
|
||||||
|
- seccomp=unconfined
|
||||||
|
ports:
|
||||||
|
- "${BACKEND_PORT:-8000}:8000"
|
||||||
|
volumes:
|
||||||
|
- ${HUGGINGFACE_CACHE_DIR:-/home/${USER}/.cache/huggingface}:/root/.cache/huggingface
|
||||||
|
- ./models:/models
|
||||||
|
healthcheck:
|
||||||
|
test: ["CMD-SHELL", "curl -sf http://localhost:8000/health >/dev/null || exit 1"]
|
||||||
|
interval: 30s
|
||||||
|
timeout: 10s
|
||||||
|
retries: 10
|
||||||
|
start_period: 120s
|
||||||
|
|
||||||
|
chat-ui:
|
||||||
|
image: ghcr.io/open-webui/open-webui:main
|
||||||
|
container_name: gemma3-chat-ui
|
||||||
|
restart: unless-stopped
|
||||||
|
depends_on:
|
||||||
|
gemma3-vllm:
|
||||||
|
condition: service_started
|
||||||
|
env_file:
|
||||||
|
- ./frontend/config/frontend.env
|
||||||
|
environment:
|
||||||
|
WEBUI_AUTH: "False"
|
||||||
|
OPENAI_API_BASE_URL: ${OPENAI_API_BASE_URL:-http://gemma3-vllm:8000/v1}
|
||||||
|
OPENAI_API_KEY: ${VLLM_API_KEY:-local-dev-key}
|
||||||
|
ENABLE_OPENAI_API: "True"
|
||||||
|
ENABLE_OLLAMA_API: "False"
|
||||||
|
DEFAULT_MODELS: ${GEMMA_MODEL_ID:-google/gemma-3-1b-it}
|
||||||
|
GLOBAL_LOG_LEVEL: INFO
|
||||||
|
WEBUI_NAME: Gemma 3 via vLLM
|
||||||
|
ports:
|
||||||
|
- "${FRONTEND_PORT:-3000}:8080"
|
||||||
|
volumes:
|
||||||
|
- ${OPEN_WEBUI_DATA_DIR:-./frontend/data/open-webui}:/app/backend/data
|
||||||
72
docs/ARCHITECTURE.md
Normal file
72
docs/ARCHITECTURE.md
Normal file
@ -0,0 +1,72 @@
|
|||||||
|
# Architecture
|
||||||
|
## Component flow
|
||||||
|
```text
|
||||||
|
[Browser @ chat.bhatfamily.in]
|
||||||
|
|
|
||||||
|
| HTTPS (terminated externally)
|
||||||
|
v
|
||||||
|
[Host reverse proxy (external to this repo)]
|
||||||
|
|
|
||||||
|
| HTTP -> localhost:3000
|
||||||
|
v
|
||||||
|
[chat-ui container: Open WebUI]
|
||||||
|
|
|
||||||
|
| HTTP (docker internal network)
|
||||||
|
v
|
||||||
|
[gemma3-vllm container: vLLM OpenAI API @ :8000/v1]
|
||||||
|
|
|
||||||
|
| reads model weights/cache
|
||||||
|
v
|
||||||
|
[Hugging Face cache + local models dir]
|
||||||
|
|
|
||||||
|
| ROCm runtime
|
||||||
|
v
|
||||||
|
[AMD Radeon 780M (RDNA3 iGPU) via /dev/kfd + /dev/dri]
|
||||||
|
```
|
||||||
|
|
||||||
|
## Services
|
||||||
|
### `gemma3-vllm`
|
||||||
|
- Image: `vllm/vllm-openai-rocm:latest`
|
||||||
|
- Purpose: Run Gemma 3 instruction model through OpenAI-compatible API.
|
||||||
|
- Host port mapping: `${BACKEND_PORT}:8000` (default `8000:8000`)
|
||||||
|
- Device passthrough:
|
||||||
|
- `/dev/kfd`
|
||||||
|
- `/dev/dri`
|
||||||
|
- Security/capabilities for ROCm debugging compatibility:
|
||||||
|
- `cap_add: SYS_PTRACE`
|
||||||
|
- `security_opt: seccomp=unconfined`
|
||||||
|
- `group_add: video`
|
||||||
|
|
||||||
|
### `chat-ui`
|
||||||
|
- Image: `ghcr.io/open-webui/open-webui:main`
|
||||||
|
- Purpose: Browser chat experience with local persistence in mounted data directory.
|
||||||
|
- Host port mapping: `${FRONTEND_PORT}:8080` (default `3000:8080`)
|
||||||
|
- Upstream model endpoint on docker network:
|
||||||
|
- `OPENAI_API_BASE_URL=http://gemma3-vllm:8000/v1`
|
||||||
|
|
||||||
|
## Networking
|
||||||
|
- Docker Compose default bridge network is used.
|
||||||
|
- `chat-ui` resolves `gemma3-vllm` by service name.
|
||||||
|
- External access is via host ports:
|
||||||
|
- API: `localhost:8000`
|
||||||
|
- UI: `localhost:3000`
|
||||||
|
|
||||||
|
## Storage
|
||||||
|
- Hugging Face cache bind mount:
|
||||||
|
- Host: `${HUGGINGFACE_CACHE_DIR}`
|
||||||
|
- Container: `/root/.cache/huggingface`
|
||||||
|
- Optional local models directory:
|
||||||
|
- Host: `./models`
|
||||||
|
- Container: `/models`
|
||||||
|
- Open WebUI data:
|
||||||
|
- Host: `${OPEN_WEBUI_DATA_DIR}`
|
||||||
|
- Container: `/app/backend/data`
|
||||||
|
|
||||||
|
## Scaling notes
|
||||||
|
This repository is designed for **single-node deployment** on one AMD APU/GPU host.
|
||||||
|
|
||||||
|
For larger deployments later:
|
||||||
|
- Move to dedicated GPUs with larger VRAM.
|
||||||
|
- Use pinned vLLM image tags and explicit engine tuning.
|
||||||
|
- Consider externalized model storage and distributed orchestration (Kubernetes/Swarm/Nomad).
|
||||||
|
- Add request routing, autoscaling, and centralized observability.
|
||||||
14
docs/README.md
Normal file
14
docs/README.md
Normal file
@ -0,0 +1,14 @@
|
|||||||
|
# Documentation Index
|
||||||
|
This folder contains operational and lifecycle documentation for the `gemma3-vllm-stack` repository.
|
||||||
|
|
||||||
|
## Files
|
||||||
|
- `ARCHITECTURE.md`: Component topology, networking, runtime dependencies, and scaling notes.
|
||||||
|
- `TROUBLESHOOTING.md`: Common failures and copy-paste diagnostics/fixes for ROCm, Docker, vLLM, and UI issues.
|
||||||
|
- `UPGRADE_NOTES.md`: Safe upgrade, rollback, and backup guidance.
|
||||||
|
|
||||||
|
## Recommended reading order
|
||||||
|
1. `ARCHITECTURE.md`
|
||||||
|
2. `TROUBLESHOOTING.md`
|
||||||
|
3. `UPGRADE_NOTES.md`
|
||||||
|
|
||||||
|
For quick start and day-1 usage, use the repository root `README.md`.
|
||||||
172
docs/TROUBLESHOOTING.md
Normal file
172
docs/TROUBLESHOOTING.md
Normal file
@ -0,0 +1,172 @@
|
|||||||
|
# Troubleshooting
|
||||||
|
## ROCm devices not visible in host
|
||||||
|
Symptoms:
|
||||||
|
- `/dev/kfd` missing
|
||||||
|
- `/dev/dri` missing
|
||||||
|
- vLLM fails to start with ROCm device errors
|
||||||
|
|
||||||
|
Checks:
|
||||||
|
```bash
|
||||||
|
ls -l /dev/kfd /dev/dri
|
||||||
|
id
|
||||||
|
getent group video
|
||||||
|
```
|
||||||
|
|
||||||
|
Expected:
|
||||||
|
- `/dev/kfd` exists
|
||||||
|
- `/dev/dri` directory exists
|
||||||
|
- user belongs to `video` group
|
||||||
|
|
||||||
|
Fixes:
|
||||||
|
```bash
|
||||||
|
sudo usermod -aG video "$USER"
|
||||||
|
newgrp video
|
||||||
|
```
|
||||||
|
Then verify ROCm tools:
|
||||||
|
```bash
|
||||||
|
rocminfo | sed -n '1,120p'
|
||||||
|
```
|
||||||
|
If ROCm is not healthy, fix host ROCm installation first.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Docker and Compose not available
|
||||||
|
Symptoms:
|
||||||
|
- `docker: command not found`
|
||||||
|
- `docker compose version` fails
|
||||||
|
|
||||||
|
Checks:
|
||||||
|
```bash
|
||||||
|
docker --version
|
||||||
|
docker compose version
|
||||||
|
```
|
||||||
|
|
||||||
|
Fix using install script (Ubuntu):
|
||||||
|
```bash
|
||||||
|
./scripts/install.sh
|
||||||
|
```
|
||||||
|
Manual fallback:
|
||||||
|
```bash
|
||||||
|
sudo apt-get update
|
||||||
|
sudo apt-get install -y ca-certificates curl gnupg
|
||||||
|
sudo install -m 0755 -d /etc/apt/keyrings
|
||||||
|
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg
|
||||||
|
sudo chmod a+r /etc/apt/keyrings/docker.gpg
|
||||||
|
echo "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.gpg] https://download.docker.com/linux/ubuntu jammy stable" | sudo tee /etc/apt/sources.list.d/docker.list >/dev/null
|
||||||
|
sudo apt-get update
|
||||||
|
sudo apt-get install -y docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin
|
||||||
|
sudo usermod -aG docker "$USER"
|
||||||
|
```
|
||||||
|
Log out/in after group change.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## vLLM container exits or fails healthchecks
|
||||||
|
Symptoms:
|
||||||
|
- `gemma3-vllm` restarting
|
||||||
|
- API endpoint unavailable
|
||||||
|
|
||||||
|
Checks:
|
||||||
|
```bash
|
||||||
|
docker compose ps
|
||||||
|
docker compose logs --tail=200 gemma3-vllm
|
||||||
|
```
|
||||||
|
|
||||||
|
Common causes and fixes:
|
||||||
|
1. Missing/invalid Hugging Face token:
|
||||||
|
```bash
|
||||||
|
grep -E '^(HF_TOKEN|GEMMA_MODEL_ID)=' .env
|
||||||
|
```
|
||||||
|
Ensure `HF_TOKEN` is set to a valid token with access to Gemma 3.
|
||||||
|
|
||||||
|
2. Model ID typo:
|
||||||
|
```bash
|
||||||
|
grep '^GEMMA_MODEL_ID=' .env
|
||||||
|
```
|
||||||
|
Use a valid model, e.g. `google/gemma-3-1b-it`.
|
||||||
|
|
||||||
|
3. ROCm runtime/device issues:
|
||||||
|
```bash
|
||||||
|
docker run --rm --device=/dev/kfd --device=/dev/dri --group-add video ubuntu:22.04 bash -lc 'ls -l /dev/kfd /dev/dri'
|
||||||
|
```
|
||||||
|
|
||||||
|
4. API key mismatch between backend and UI/tests:
|
||||||
|
```bash
|
||||||
|
grep -E '^(VLLM_API_KEY|OPENAI_API_BASE_URL)=' .env frontend/config/frontend.env 2>/dev/null || true
|
||||||
|
```
|
||||||
|
Keep keys consistent.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Out-of-memory (OOM) or low VRAM errors
|
||||||
|
Symptoms:
|
||||||
|
- startup failure referencing memory allocation
|
||||||
|
- runtime generation failures
|
||||||
|
|
||||||
|
Checks:
|
||||||
|
```bash
|
||||||
|
docker compose logs --tail=300 gemma3-vllm | grep -Ei 'out of memory|oom|memory|cuda|hip|rocm'
|
||||||
|
```
|
||||||
|
|
||||||
|
Mitigations:
|
||||||
|
1. Reduce context length in `.env`:
|
||||||
|
```bash
|
||||||
|
VLLM_MAX_MODEL_LEN=2048
|
||||||
|
```
|
||||||
|
2. Lower GPU memory utilization target:
|
||||||
|
```bash
|
||||||
|
VLLM_GPU_MEMORY_UTILIZATION=0.75
|
||||||
|
```
|
||||||
|
3. Use a smaller Gemma 3 variant in `.env`.
|
||||||
|
4. Restart stack:
|
||||||
|
```bash
|
||||||
|
./scripts/restart.sh
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## UI loads but cannot reach vLLM backend
|
||||||
|
Symptoms:
|
||||||
|
- Browser opens UI but chat requests fail.
|
||||||
|
|
||||||
|
Checks:
|
||||||
|
```bash
|
||||||
|
docker compose ps
|
||||||
|
docker compose logs --tail=200 chat-ui
|
||||||
|
docker compose logs --tail=200 gemma3-vllm
|
||||||
|
```
|
||||||
|
|
||||||
|
Verify frontend backend URL:
|
||||||
|
```bash
|
||||||
|
grep -E '^OPENAI_API_BASE_URL=' frontend/config/frontend.env
|
||||||
|
```
|
||||||
|
Expected value:
|
||||||
|
```text
|
||||||
|
OPENAI_API_BASE_URL=http://gemma3-vllm:8000/v1
|
||||||
|
```
|
||||||
|
|
||||||
|
Verify API directly from host:
|
||||||
|
```bash
|
||||||
|
./scripts/test_api.sh
|
||||||
|
```
|
||||||
|
|
||||||
|
If API works from host but not UI, recreate frontend:
|
||||||
|
```bash
|
||||||
|
docker compose up -d --force-recreate chat-ui
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Health checks and endpoint validation
|
||||||
|
Run all smoke tests:
|
||||||
|
```bash
|
||||||
|
./scripts/test_api.sh
|
||||||
|
./scripts/test_ui.sh
|
||||||
|
python3 scripts/test_python_client.py
|
||||||
|
```
|
||||||
|
|
||||||
|
If one fails, inspect corresponding service logs and then restart:
|
||||||
|
```bash
|
||||||
|
docker compose logs --tail=200 gemma3-vllm chat-ui
|
||||||
|
./scripts/restart.sh
|
||||||
|
```
|
||||||
50
docs/UPGRADE_NOTES.md
Normal file
50
docs/UPGRADE_NOTES.md
Normal file
@ -0,0 +1,50 @@
|
|||||||
|
# Upgrade Notes
|
||||||
|
## Standard safe upgrade path
|
||||||
|
From repository root:
|
||||||
|
```bash
|
||||||
|
git pull
|
||||||
|
docker compose pull
|
||||||
|
./scripts/restart.sh
|
||||||
|
```
|
||||||
|
Then run smoke tests:
|
||||||
|
```bash
|
||||||
|
./scripts/test_api.sh
|
||||||
|
./scripts/test_ui.sh
|
||||||
|
python3 scripts/test_python_client.py
|
||||||
|
```
|
||||||
|
|
||||||
|
## Versioning guidance
|
||||||
|
- Prefer pinning image tags in `docker-compose.yml` once your deployment is stable.
|
||||||
|
- Upgrading vLLM may change runtime defaults or engine behavior; check vLLM release notes before major version jumps.
|
||||||
|
- Keep `GEMMA_MODEL_ID` explicit in `.env` to avoid unintentional model drift.
|
||||||
|
|
||||||
|
## Model upgrade considerations
|
||||||
|
When changing Gemma 3 variants (for example, from 1B to larger sizes):
|
||||||
|
- Verify host RAM and GPU memory capacity.
|
||||||
|
- Expect re-download of model weights and larger disk usage.
|
||||||
|
- Re-tune:
|
||||||
|
- `VLLM_MAX_MODEL_LEN`
|
||||||
|
- `VLLM_GPU_MEMORY_UTILIZATION`
|
||||||
|
- Re-run validation scripts after restart.
|
||||||
|
|
||||||
|
## Backup recommendations
|
||||||
|
Before major upgrades, back up local persistent data:
|
||||||
|
```bash
|
||||||
|
mkdir -p backups
|
||||||
|
tar -czf backups/hf-cache-$(date +%Y%m%d-%H%M%S).tar.gz "${HOME}/.cache/huggingface"
|
||||||
|
tar -czf backups/open-webui-data-$(date +%Y%m%d-%H%M%S).tar.gz frontend/data/open-webui
|
||||||
|
```
|
||||||
|
If you use local predownloaded models:
|
||||||
|
```bash
|
||||||
|
tar -czf backups/models-$(date +%Y%m%d-%H%M%S).tar.gz models
|
||||||
|
```
|
||||||
|
|
||||||
|
## Rollback approach
|
||||||
|
If a new image/model combination fails:
|
||||||
|
1. Revert `docker-compose.yml` and `.env` to previous known-good values.
|
||||||
|
2. Pull previous pinned images (if pinned by tag/digest).
|
||||||
|
3. Restart:
|
||||||
|
```bash
|
||||||
|
./scripts/restart.sh
|
||||||
|
```
|
||||||
|
4. Re-run smoke tests.
|
||||||
3
frontend/Dockerfile
Normal file
3
frontend/Dockerfile
Normal file
@ -0,0 +1,3 @@
|
|||||||
|
# Optional frontend Dockerfile.
|
||||||
|
# This stack uses the official Open WebUI image directly from docker-compose.yml.
|
||||||
|
FROM ghcr.io/open-webui/open-webui:main
|
||||||
5
frontend/config/frontend.env.example
Normal file
5
frontend/config/frontend.env.example
Normal file
@ -0,0 +1,5 @@
|
|||||||
|
FRONTEND_PORT=3000
|
||||||
|
OPENAI_API_BASE_URL=http://gemma3-vllm:8000/v1
|
||||||
|
VLLM_API_KEY=YOUR_LOCAL_API_KEY_HERE
|
||||||
|
GEMMA_MODEL_ID=google/gemma-3-1b-it
|
||||||
|
OPEN_WEBUI_DATA_DIR=./frontend/data/open-webui
|
||||||
155
scripts/install.sh
Executable file
155
scripts/install.sh
Executable file
@ -0,0 +1,155 @@
|
|||||||
|
#!/usr/bin/env bash
|
||||||
|
# Installs prerequisites (if needed), prepares config files, and starts Gemma 3 + vLLM + chat UI stack.
|
||||||
|
set -euo pipefail
|
||||||
|
|
||||||
|
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
|
||||||
|
REPO_ROOT="$(cd "${SCRIPT_DIR}/.." && pwd)"
|
||||||
|
|
||||||
|
log() {
|
||||||
|
printf '[install] %s
|
||||||
|
' "$*"
|
||||||
|
}
|
||||||
|
|
||||||
|
err() {
|
||||||
|
printf '[install][error] %s
|
||||||
|
' "$*" >&2
|
||||||
|
}
|
||||||
|
|
||||||
|
require_linux() {
|
||||||
|
if [[ "$(uname -s)" != "Linux" ]]; then
|
||||||
|
err "This script supports Linux only."
|
||||||
|
exit 1
|
||||||
|
fi
|
||||||
|
}
|
||||||
|
|
||||||
|
install_docker_ubuntu() {
|
||||||
|
log "Installing Docker Engine and Compose plugin using official Docker apt repository."
|
||||||
|
sudo apt-get update
|
||||||
|
sudo apt-get install -y ca-certificates curl gnupg
|
||||||
|
sudo install -m 0755 -d /etc/apt/keyrings
|
||||||
|
|
||||||
|
if [[ ! -f /etc/apt/keyrings/docker.gpg ]]; then
|
||||||
|
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg
|
||||||
|
sudo chmod a+r /etc/apt/keyrings/docker.gpg
|
||||||
|
fi
|
||||||
|
|
||||||
|
source /etc/os-release
|
||||||
|
local arch
|
||||||
|
arch="$(dpkg --print-architecture)"
|
||||||
|
local codename
|
||||||
|
codename="${VERSION_CODENAME:-jammy}"
|
||||||
|
|
||||||
|
echo "deb [arch=${arch} signed-by=/etc/apt/keyrings/docker.gpg] https://download.docker.com/linux/ubuntu ${codename} stable" | sudo tee /etc/apt/sources.list.d/docker.list >/dev/null
|
||||||
|
|
||||||
|
sudo apt-get update
|
||||||
|
sudo apt-get install -y docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin
|
||||||
|
|
||||||
|
if ! sudo systemctl is-active --quiet docker; then
|
||||||
|
sudo systemctl enable --now docker
|
||||||
|
fi
|
||||||
|
|
||||||
|
if ! getent group docker >/dev/null; then
|
||||||
|
sudo groupadd docker
|
||||||
|
fi
|
||||||
|
|
||||||
|
if ! id -nG "${USER}" | grep -qw docker; then
|
||||||
|
sudo usermod -aG docker "${USER}"
|
||||||
|
log "Added ${USER} to docker group. You may need to log out and back in for group membership to apply."
|
||||||
|
fi
|
||||||
|
}
|
||||||
|
|
||||||
|
check_or_install_docker() {
|
||||||
|
local have_docker=1
|
||||||
|
local have_compose=1
|
||||||
|
|
||||||
|
if ! command -v docker >/dev/null 2>&1; then
|
||||||
|
have_docker=0
|
||||||
|
fi
|
||||||
|
|
||||||
|
if ! docker compose version >/dev/null 2>&1; then
|
||||||
|
have_compose=0
|
||||||
|
fi
|
||||||
|
|
||||||
|
if [[ ${have_docker} -eq 1 && ${have_compose} -eq 1 ]]; then
|
||||||
|
log "Docker and Compose plugin are already available."
|
||||||
|
return
|
||||||
|
fi
|
||||||
|
|
||||||
|
if [[ -f /etc/os-release ]]; then
|
||||||
|
source /etc/os-release
|
||||||
|
if [[ "${ID:-}" == "ubuntu" ]]; then
|
||||||
|
install_docker_ubuntu
|
||||||
|
return
|
||||||
|
fi
|
||||||
|
fi
|
||||||
|
|
||||||
|
err "Docker/Compose missing and automatic installation is implemented for Ubuntu only."
|
||||||
|
err "See docs/TROUBLESHOOTING.md#docker-and-compose-not-available"
|
||||||
|
exit 1
|
||||||
|
}
|
||||||
|
|
||||||
|
prepare_env_files() {
|
||||||
|
if [[ ! -f "${REPO_ROOT}/.env" ]]; then
|
||||||
|
cp "${REPO_ROOT}/.env.example" "${REPO_ROOT}/.env"
|
||||||
|
log "Created .env from .env.example."
|
||||||
|
err "IMPORTANT: edit .env and set HF_TOKEN (and optionally VLLM_API_KEY) before production use."
|
||||||
|
fi
|
||||||
|
|
||||||
|
if [[ ! -f "${REPO_ROOT}/backend/config/model.env" ]]; then
|
||||||
|
cp "${REPO_ROOT}/backend/config/model.env.example" "${REPO_ROOT}/backend/config/model.env"
|
||||||
|
log "Created backend/config/model.env from example."
|
||||||
|
fi
|
||||||
|
|
||||||
|
if [[ ! -f "${REPO_ROOT}/frontend/config/frontend.env" ]]; then
|
||||||
|
cp "${REPO_ROOT}/frontend/config/frontend.env.example" "${REPO_ROOT}/frontend/config/frontend.env"
|
||||||
|
log "Created frontend/config/frontend.env from example."
|
||||||
|
fi
|
||||||
|
|
||||||
|
mkdir -p "${REPO_ROOT}/models" "${REPO_ROOT}/frontend/data/open-webui"
|
||||||
|
}
|
||||||
|
|
||||||
|
warn_if_rocm_devices_missing() {
|
||||||
|
if [[ ! -e /dev/kfd || ! -d /dev/dri ]]; then
|
||||||
|
err "ROCm device files /dev/kfd or /dev/dri are not available."
|
||||||
|
err "See docs/TROUBLESHOOTING.md#rocm-devices-not-visible-in-host"
|
||||||
|
fi
|
||||||
|
}
|
||||||
|
|
||||||
|
start_stack() {
|
||||||
|
log "Pulling container images."
|
||||||
|
docker compose -f "${REPO_ROOT}/docker-compose.yml" --env-file "${REPO_ROOT}/.env" pull
|
||||||
|
|
||||||
|
log "Starting containers in detached mode."
|
||||||
|
docker compose -f "${REPO_ROOT}/docker-compose.yml" --env-file "${REPO_ROOT}/.env" up -d
|
||||||
|
}
|
||||||
|
|
||||||
|
show_status_and_urls() {
|
||||||
|
local backend_port frontend_port
|
||||||
|
backend_port="$(grep -E '^BACKEND_PORT=' "${REPO_ROOT}/.env" | tail -n1 | cut -d'=' -f2 || true)"
|
||||||
|
frontend_port="$(grep -E '^FRONTEND_PORT=' "${REPO_ROOT}/.env" | tail -n1 | cut -d'=' -f2 || true)"
|
||||||
|
backend_port="${backend_port:-8000}"
|
||||||
|
frontend_port="${frontend_port:-3000}"
|
||||||
|
|
||||||
|
log "Backend status:"
|
||||||
|
docker compose -f "${REPO_ROOT}/docker-compose.yml" --env-file "${REPO_ROOT}/.env" ps gemma3-vllm || true
|
||||||
|
|
||||||
|
log "Frontend status:"
|
||||||
|
docker compose -f "${REPO_ROOT}/docker-compose.yml" --env-file "${REPO_ROOT}/.env" ps chat-ui || true
|
||||||
|
|
||||||
|
printf '
|
||||||
|
'
|
||||||
|
log "API endpoint: http://localhost:${backend_port}/v1"
|
||||||
|
log "Chat UI endpoint: http://localhost:${frontend_port}"
|
||||||
|
log "If startup fails, inspect logs with: docker compose logs --tail=200 gemma3-vllm chat-ui"
|
||||||
|
}
|
||||||
|
|
||||||
|
main() {
|
||||||
|
require_linux
|
||||||
|
check_or_install_docker
|
||||||
|
prepare_env_files
|
||||||
|
warn_if_rocm_devices_missing
|
||||||
|
start_stack
|
||||||
|
show_status_and_urls
|
||||||
|
}
|
||||||
|
|
||||||
|
main "$@"
|
||||||
25
scripts/restart.sh
Executable file
25
scripts/restart.sh
Executable file
@ -0,0 +1,25 @@
|
|||||||
|
#!/usr/bin/env bash
|
||||||
|
# Restarts the Gemma 3 vLLM stack and shows service status.
|
||||||
|
set -euo pipefail
|
||||||
|
|
||||||
|
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
|
||||||
|
REPO_ROOT="$(cd "${SCRIPT_DIR}/.." && pwd)"
|
||||||
|
ENV_FILE="${REPO_ROOT}/.env"
|
||||||
|
|
||||||
|
log() {
|
||||||
|
printf '[restart] %s
|
||||||
|
' "$*"
|
||||||
|
}
|
||||||
|
|
||||||
|
if [[ ! -f "${ENV_FILE}" ]]; then
|
||||||
|
ENV_FILE="${REPO_ROOT}/.env.example"
|
||||||
|
fi
|
||||||
|
|
||||||
|
log "Stopping stack."
|
||||||
|
docker compose -f "${REPO_ROOT}/docker-compose.yml" --env-file "${ENV_FILE}" down
|
||||||
|
|
||||||
|
log "Starting stack."
|
||||||
|
docker compose -f "${REPO_ROOT}/docker-compose.yml" --env-file "${ENV_FILE}" up -d
|
||||||
|
|
||||||
|
log "Current status:"
|
||||||
|
docker compose -f "${REPO_ROOT}/docker-compose.yml" --env-file "${ENV_FILE}" ps
|
||||||
54
scripts/test_api.sh
Executable file
54
scripts/test_api.sh
Executable file
@ -0,0 +1,54 @@
|
|||||||
|
#!/usr/bin/env bash
|
||||||
|
# Tests local vLLM OpenAI-compatible API using curl and validates response shape.
|
||||||
|
set -euo pipefail
|
||||||
|
|
||||||
|
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
|
||||||
|
REPO_ROOT="$(cd "${SCRIPT_DIR}/.." && pwd)"
|
||||||
|
ENV_FILE="${REPO_ROOT}/.env"
|
||||||
|
|
||||||
|
if [[ ! -f "${ENV_FILE}" ]]; then
|
||||||
|
echo "[test_api][error] .env file not found. Copy .env.example to .env first." >&2
|
||||||
|
exit 1
|
||||||
|
fi
|
||||||
|
|
||||||
|
# shellcheck disable=SC1090
|
||||||
|
source "${ENV_FILE}"
|
||||||
|
|
||||||
|
BACKEND_PORT="${BACKEND_PORT:-8000}"
|
||||||
|
GEMMA_MODEL_ID="${GEMMA_MODEL_ID:-google/gemma-3-1b-it}"
|
||||||
|
VLLM_API_KEY="${VLLM_API_KEY:-EMPTY}"
|
||||||
|
API_URL="http://localhost:${BACKEND_PORT}/v1/chat/completions"
|
||||||
|
|
||||||
|
payload_file="$(mktemp)"
|
||||||
|
response_file="$(mktemp)"
|
||||||
|
trap 'rm -f "${payload_file}" "${response_file}"' EXIT
|
||||||
|
|
||||||
|
cat > "${payload_file}" <<JSON
|
||||||
|
{
|
||||||
|
"model": "${GEMMA_MODEL_ID}",
|
||||||
|
"messages": [
|
||||||
|
{"role": "system", "content": "You are a concise assistant."},
|
||||||
|
{"role": "user", "content": "Say hello from Gemma 3 running on vLLM."}
|
||||||
|
],
|
||||||
|
"temperature": 0.2,
|
||||||
|
"max_tokens": 64
|
||||||
|
}
|
||||||
|
JSON
|
||||||
|
|
||||||
|
http_status="$(curl -sS -o "${response_file}" -w '%{http_code}' -H "Content-Type: application/json" -H "Authorization: Bearer ${VLLM_API_KEY}" -X POST "${API_URL}" --data @"${payload_file}")"
|
||||||
|
|
||||||
|
if [[ ! "${http_status}" =~ ^2 ]]; then
|
||||||
|
echo "[test_api][error] API returned HTTP ${http_status}" >&2
|
||||||
|
cat "${response_file}" >&2
|
||||||
|
echo "[test_api][hint] See docs/TROUBLESHOOTING.md#vllm-container-exits-or-fails-healthchecks" >&2
|
||||||
|
exit 1
|
||||||
|
fi
|
||||||
|
|
||||||
|
if ! grep -q '"choices"' "${response_file}"; then
|
||||||
|
echo "[test_api][error] API response did not include expected 'choices' field." >&2
|
||||||
|
cat "${response_file}" >&2
|
||||||
|
exit 1
|
||||||
|
fi
|
||||||
|
|
||||||
|
echo "[test_api] Success. API responded with expected structure."
|
||||||
|
cat "${response_file}"
|
||||||
75
scripts/test_python_client.py
Executable file
75
scripts/test_python_client.py
Executable file
@ -0,0 +1,75 @@
|
|||||||
|
#!/usr/bin/env python3
|
||||||
|
"""Tests local vLLM OpenAI-compatible API using openai>=1.x Python client."""
|
||||||
|
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
import os
|
||||||
|
import sys
|
||||||
|
from pathlib import Path
|
||||||
|
|
||||||
|
|
||||||
|
def load_dotenv(dotenv_path: Path) -> None:
|
||||||
|
if not dotenv_path.exists():
|
||||||
|
return
|
||||||
|
|
||||||
|
for raw_line in dotenv_path.read_text(encoding="utf-8").splitlines():
|
||||||
|
line = raw_line.strip()
|
||||||
|
if not line or line.startswith("#") or "=" not in line:
|
||||||
|
continue
|
||||||
|
key, value = line.split("=", 1)
|
||||||
|
key = key.strip()
|
||||||
|
value = value.strip().strip('"').strip("'")
|
||||||
|
os.environ.setdefault(key, value)
|
||||||
|
|
||||||
|
|
||||||
|
def main() -> int:
|
||||||
|
repo_root = Path(__file__).resolve().parent.parent
|
||||||
|
load_dotenv(repo_root / ".env")
|
||||||
|
|
||||||
|
backend_port = os.getenv("BACKEND_PORT", "8000")
|
||||||
|
model_id = os.getenv("GEMMA_MODEL_ID", "google/gemma-3-1b-it")
|
||||||
|
api_key = os.getenv("VLLM_API_KEY", "EMPTY")
|
||||||
|
base_url = f"http://localhost:{backend_port}/v1"
|
||||||
|
|
||||||
|
try:
|
||||||
|
from openai import OpenAI
|
||||||
|
except ImportError:
|
||||||
|
print("[test_python_client][error] openai package is not installed.", file=sys.stderr)
|
||||||
|
print("Install it with: python3 -m pip install openai", file=sys.stderr)
|
||||||
|
return 1
|
||||||
|
|
||||||
|
client = OpenAI(api_key=api_key, base_url=base_url)
|
||||||
|
|
||||||
|
try:
|
||||||
|
response = client.chat.completions.create(
|
||||||
|
model=model_id,
|
||||||
|
messages=[
|
||||||
|
{"role": "system", "content": "You are a concise assistant."},
|
||||||
|
{
|
||||||
|
"role": "user",
|
||||||
|
"content": "Say hello from Gemma 3 running on vLLM in one sentence.",
|
||||||
|
},
|
||||||
|
],
|
||||||
|
temperature=0.2,
|
||||||
|
max_tokens=64,
|
||||||
|
)
|
||||||
|
except Exception as exc:
|
||||||
|
print(f"[test_python_client][error] Request failed: {exc}", file=sys.stderr)
|
||||||
|
print(
|
||||||
|
"[test_python_client][hint] See docs/TROUBLESHOOTING.md#vllm-container-exits-or-fails-healthchecks",
|
||||||
|
file=sys.stderr,
|
||||||
|
)
|
||||||
|
return 1
|
||||||
|
|
||||||
|
if not response.choices or not response.choices[0].message:
|
||||||
|
print("[test_python_client][error] No completion choices returned.", file=sys.stderr)
|
||||||
|
return 1
|
||||||
|
|
||||||
|
content = response.choices[0].message.content or ""
|
||||||
|
print("[test_python_client] Success. Assistant response:")
|
||||||
|
print(content.strip())
|
||||||
|
return 0
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
raise SystemExit(main())
|
||||||
25
scripts/test_ui.sh
Executable file
25
scripts/test_ui.sh
Executable file
@ -0,0 +1,25 @@
|
|||||||
|
#!/usr/bin/env bash
|
||||||
|
# Tests whether the chat UI is reachable on localhost frontend port.
|
||||||
|
set -euo pipefail
|
||||||
|
|
||||||
|
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
|
||||||
|
REPO_ROOT="$(cd "${SCRIPT_DIR}/.." && pwd)"
|
||||||
|
ENV_FILE="${REPO_ROOT}/.env"
|
||||||
|
|
||||||
|
if [[ -f "${ENV_FILE}" ]]; then
|
||||||
|
# shellcheck disable=SC1090
|
||||||
|
source "${ENV_FILE}"
|
||||||
|
fi
|
||||||
|
|
||||||
|
FRONTEND_PORT="${FRONTEND_PORT:-3000}"
|
||||||
|
UI_URL="http://localhost:${FRONTEND_PORT}"
|
||||||
|
|
||||||
|
http_status="$(curl -sS -o /dev/null -w '%{http_code}' "${UI_URL}")"
|
||||||
|
|
||||||
|
if [[ "${http_status}" != "200" && "${http_status}" != "301" && "${http_status}" != "302" ]]; then
|
||||||
|
echo "[test_ui][error] UI check failed with HTTP status ${http_status} at ${UI_URL}" >&2
|
||||||
|
echo "[test_ui][hint] See docs/TROUBLESHOOTING.md#ui-loads-but-cannot-reach-vllm-backend" >&2
|
||||||
|
exit 1
|
||||||
|
fi
|
||||||
|
|
||||||
|
echo "[test_ui] Chat UI is reachable at ${UI_URL} (HTTP ${http_status})."
|
||||||
98
scripts/uninstall.sh
Executable file
98
scripts/uninstall.sh
Executable file
@ -0,0 +1,98 @@
|
|||||||
|
#!/usr/bin/env bash
|
||||||
|
# Stops and removes the Gemma 3 vLLM stack. Optional --purge removes local model/cache data.
|
||||||
|
set -euo pipefail
|
||||||
|
|
||||||
|
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
|
||||||
|
REPO_ROOT="$(cd "${SCRIPT_DIR}/.." && pwd)"
|
||||||
|
PURGE=0
|
||||||
|
|
||||||
|
log() {
|
||||||
|
printf '[uninstall] %s
|
||||||
|
' "$*"
|
||||||
|
}
|
||||||
|
|
||||||
|
err() {
|
||||||
|
printf '[uninstall][error] %s
|
||||||
|
' "$*" >&2
|
||||||
|
}
|
||||||
|
|
||||||
|
usage() {
|
||||||
|
cat <<'EOF'
|
||||||
|
Usage: scripts/uninstall.sh [--purge]
|
||||||
|
|
||||||
|
Options:
|
||||||
|
--purge Remove local Hugging Face cache directory and ./models data in addition to containers/volumes.
|
||||||
|
-h, --help Show this help message.
|
||||||
|
EOF
|
||||||
|
}
|
||||||
|
|
||||||
|
while [[ $# -gt 0 ]]; do
|
||||||
|
case "$1" in
|
||||||
|
--purge)
|
||||||
|
PURGE=1
|
||||||
|
;;
|
||||||
|
-h|--help)
|
||||||
|
usage
|
||||||
|
exit 0
|
||||||
|
;;
|
||||||
|
*)
|
||||||
|
err "Unknown argument: $1"
|
||||||
|
usage
|
||||||
|
exit 1
|
||||||
|
;;
|
||||||
|
esac
|
||||||
|
shift
|
||||||
|
done
|
||||||
|
|
||||||
|
if [[ ! -f "${REPO_ROOT}/docker-compose.yml" ]]; then
|
||||||
|
err "docker-compose.yml not found at ${REPO_ROOT}."
|
||||||
|
exit 1
|
||||||
|
fi
|
||||||
|
|
||||||
|
ENV_FILE="${REPO_ROOT}/.env"
|
||||||
|
if [[ ! -f "${ENV_FILE}" ]]; then
|
||||||
|
ENV_FILE="${REPO_ROOT}/.env.example"
|
||||||
|
fi
|
||||||
|
|
||||||
|
log "Stopping stack and removing containers, networks, and named/anonymous volumes."
|
||||||
|
docker compose -f "${REPO_ROOT}/docker-compose.yml" --env-file "${ENV_FILE}" down -v || true
|
||||||
|
|
||||||
|
if [[ ${PURGE} -eq 1 ]]; then
|
||||||
|
log "Purge requested. Removing local data directories used by this stack."
|
||||||
|
|
||||||
|
huggingface_cache_dir="$(grep -E '^HUGGINGFACE_CACHE_DIR=' "${ENV_FILE}" | tail -n1 | cut -d'=' -f2- || true)"
|
||||||
|
open_webui_data_dir="$(grep -E '^OPEN_WEBUI_DATA_DIR=' "${ENV_FILE}" | tail -n1 | cut -d'=' -f2- || true)"
|
||||||
|
|
||||||
|
if [[ -n "${huggingface_cache_dir}" ]]; then
|
||||||
|
# Expand potential variables such as ${USER}
|
||||||
|
evaluated_hf_dir="$(eval "printf '%s' "${huggingface_cache_dir}"")"
|
||||||
|
if [[ -d "${evaluated_hf_dir}" ]]; then
|
||||||
|
log "Removing Hugging Face cache directory: ${evaluated_hf_dir}"
|
||||||
|
rm -rf "${evaluated_hf_dir}"
|
||||||
|
else
|
||||||
|
log "Hugging Face cache directory not found: ${evaluated_hf_dir}"
|
||||||
|
fi
|
||||||
|
fi
|
||||||
|
|
||||||
|
if [[ -z "${open_webui_data_dir}" ]]; then
|
||||||
|
open_webui_data_dir="./frontend/data/open-webui"
|
||||||
|
fi
|
||||||
|
|
||||||
|
if [[ "${open_webui_data_dir}" == ./* ]]; then
|
||||||
|
open_webui_data_dir="${REPO_ROOT}/${open_webui_data_dir#./}"
|
||||||
|
fi
|
||||||
|
|
||||||
|
if [[ -d "${open_webui_data_dir}" ]]; then
|
||||||
|
log "Removing Open WebUI data directory: ${open_webui_data_dir}"
|
||||||
|
rm -rf "${open_webui_data_dir}"
|
||||||
|
fi
|
||||||
|
|
||||||
|
if [[ -d "${REPO_ROOT}/models" ]]; then
|
||||||
|
log "Removing local models directory: ${REPO_ROOT}/models"
|
||||||
|
rm -rf "${REPO_ROOT}/models"
|
||||||
|
fi
|
||||||
|
else
|
||||||
|
log "Safe mode enabled (default). Local model/cache data was preserved."
|
||||||
|
fi
|
||||||
|
|
||||||
|
log "Uninstall complete."
|
||||||
Reference in New Issue
Block a user