# Architecture ## Component flow ```text [Browser @ chat.bhatfamily.in] | | HTTPS (terminated externally) v [Host reverse proxy (external to this repo)] | | HTTP -> localhost:3000 v [chat-ui container: Open WebUI] | | HTTP (docker internal network) v [gemma3-vllm container: vLLM OpenAI API @ :8000/v1] | | reads model weights/cache v [Hugging Face cache + local models dir] | | ROCm runtime v [AMD Radeon 780M (RDNA3 iGPU) via /dev/kfd + /dev/dri] ``` ## Services ### `gemma3-vllm` - Image: `vllm/vllm-openai-rocm:latest` - Purpose: Run Gemma 3 instruction model through OpenAI-compatible API. - Host port mapping: `${BACKEND_PORT}:8000` (default `8000:8000`) - Device passthrough: - `/dev/kfd` - `/dev/dri` - Security/capabilities for ROCm debugging compatibility: - `cap_add: SYS_PTRACE` - `security_opt: seccomp=unconfined` - `group_add: video` ### `chat-ui` - Image: `ghcr.io/open-webui/open-webui:main` - Purpose: Browser chat experience with local persistence in mounted data directory. - Host port mapping: `${FRONTEND_PORT}:8080` (default `3000:8080`) - Upstream model endpoint on docker network: - `OPENAI_API_BASE_URL=http://gemma3-vllm:8000/v1` ## Networking - Docker Compose default bridge network is used. - `chat-ui` resolves `gemma3-vllm` by service name. - External access is via host ports: - API: `localhost:8000` - UI: `localhost:3000` ## Storage - Hugging Face cache bind mount: - Host: `${HUGGINGFACE_CACHE_DIR}` - Container: `/root/.cache/huggingface` - Optional local models directory: - Host: `./models` - Container: `/models` - Open WebUI data: - Host: `${OPEN_WEBUI_DATA_DIR}` - Container: `/app/backend/data` ## Scaling notes This repository is designed for **single-node deployment** on one AMD APU/GPU host. For larger deployments later: - Move to dedicated GPUs with larger VRAM. - Use pinned vLLM image tags and explicit engine tuning. - Consider externalized model storage and distributed orchestration (Kubernetes/Swarm/Nomad). - Add request routing, autoscaling, and centralized observability.