2.2 KiB
2.2 KiB
Architecture
Component flow
[Browser @ chat.bhatfamily.in]
|
| HTTPS (terminated externally)
v
[Host reverse proxy (external to this repo)]
|
| HTTP -> localhost:3000
v
[chat-ui container: Open WebUI]
|
| HTTP (docker internal network)
v
[gemma3-vllm container: vLLM OpenAI API @ :8000/v1]
|
| reads model weights/cache
v
[Hugging Face cache + local models dir]
|
| ROCm runtime
v
[AMD Radeon 780M (RDNA3 iGPU) via /dev/kfd + /dev/dri]
Services
gemma3-vllm
- Image:
vllm/vllm-openai-rocm:latest - Purpose: Run Gemma 3 instruction model through OpenAI-compatible API.
- Host port mapping:
${BACKEND_PORT}:8000(default8000:8000) - Device passthrough:
/dev/kfd/dev/dri
- Security/capabilities for ROCm debugging compatibility:
cap_add: SYS_PTRACEsecurity_opt: seccomp=unconfinedgroup_add: video
chat-ui
- Image:
ghcr.io/open-webui/open-webui:main - Purpose: Browser chat experience with local persistence in mounted data directory.
- Host port mapping:
${FRONTEND_PORT}:8080(default3000:8080) - Upstream model endpoint on docker network:
OPENAI_API_BASE_URL=http://gemma3-vllm:8000/v1
Networking
- Docker Compose default bridge network is used.
chat-uiresolvesgemma3-vllmby service name.- External access is via host ports:
- API:
localhost:8000 - UI:
localhost:3000
- API:
Storage
- Hugging Face cache bind mount:
- Host:
${HUGGINGFACE_CACHE_DIR} - Container:
/root/.cache/huggingface
- Host:
- Optional local models directory:
- Host:
./models - Container:
/models
- Host:
- Open WebUI data:
- Host:
${OPEN_WEBUI_DATA_DIR} - Container:
/app/backend/data
- Host:
Scaling notes
This repository is designed for single-node deployment on one AMD APU/GPU host.
For larger deployments later:
- Move to dedicated GPUs with larger VRAM.
- Use pinned vLLM image tags and explicit engine tuning.
- Consider externalized model storage and distributed orchestration (Kubernetes/Swarm/Nomad).
- Add request routing, autoscaling, and centralized observability.