LocAI is a fully client-side AI chat application that lets you download, run, and switch between multiple large language models directly in your browser—no signup or remote server required. It caches model weights and conversation history in IndexedDB, uses a Service Worker for offline support, and leverages WebGPU for accelerated inference. Every chat stays private on your device, and your last-used model, open conversation, and slider settings persist across reloads.
-
Client-Side Inference: Download and run large language models entirely in your browser via WebGPU—no remote servers or API keys required.
-
Multi-Model Support: Browse, select, and switch between a growing catalog of MLC-AI models (e.g. Qwen2, Llama-3, Phi-3, Gemma-2) from the https://mlc.ai/models directory.
-
Offline-First: Service Worker caches assets and model weights for offline use; once downloaded, models load instantly without network.
-
Persistent Chats: Conversations and messages are stored in IndexedDB; your chat history, open conversation, and slider settings persist across page reloads.
-
Real-Time Streaming: Partial responses stream in with live rendering and auto-formatted code blocks; dynamic sliders control temperature, top-p, penalties, max tokens, and choice count.
-
WebGPU Enforcement: Detects and blocks unsupported browsers (Firefox, Safari); guides users to Chrome/Chromium with WebGPU enabled.
-
Responsive UI: Modernized design with custom components (no external UI library), adaptive layout for desktop and mobile, and accessible modals for Info, Model Selection, Advanced Settings, and WebGPU errors.
-
Localhost Mode: Last‐used model and last open chat automatically reloaded from localStorage for seamless restarts.
-
Privacy & Security: All data remains on-device—clearing browser storage removes everything, ensuring zero-trust, zero-signup privacy.
locai.mp4
A more extensive showcase is available in my portfolio!
-
Astro 5: Static-site generator powering the modern, lightning-fast frontend.
-
React 19: UI library for building interactive chat components.
-
TypeScript: Adds static typing across components, hooks, and data models.
-
IndexedDB: Client-side storage of model weights and conversation history.
-
Service Worker (PWA): Offline caching of assets and model files.
-
LocalStorage: Persists last-used model, open chat, and slider settings.
-
@mlc-ai/web-llm: Library for loading and running LLMs in the browser.
-
Framer Motion: Animations and transitions for a polished UX.
-
react-markdown + remark-gfm: Rendering of markdown formatted responses and code blocks.
-
react-syntax-highlighter: Code formatting and highlighting during streaming.
-
GitHub Actions: CI/CD workflows for builds, tests, and deploys to GitHub Pages.
- Docker
- Clone the repository:
git clone https://github.com/vladimircuriel/locai-chat
- Navigate to the project directory:
cd locai-chat
- Run the commands:
docker build -t locai:latest .
docker run -p 4321:4321 locai:latest
- Access the application:
Open your browser and visit http://localhost:4321
to access the user interface.
- Models must be re-downloaded when switching quantization or major versions, leading to extra wait and disk use.
- Very large models can exhaust device RAM/VRAM and may crash or hang browsers on lower-end hardware.
- Chat history in IndexedDB is unencrypted and tied to the browser; no built-in export or sync across devices.
- Mobile browsers with experimental or missing WebGPU support are blocked entirely—no lightweight fallback.
- No built-in search or filtering within long conversation histories.
- Context window is limited by model token capacity; very long chats may lose early context when truncated.
- No support for custom prompts or system-level instruction templates beyond a single “system” message.
- No collaborative or multi-user features—every session is isolated to the local device.
- Lack of model fine-tuning or personalization options; you’re limited to public pre-trained checkpoints.
- Clearing browser storage (IndexedDB/localStorage) deletes all chats and model caches without warning.