WebLLM is an in-browser, high-performance language model inference engine. It uses WebGPU to accelerate the hardware, enabling powerful LLM functions directly within web browsers, without server-side processing. It is compatible with the OpenAI API, allowing seamless integration of functionalities like JSON mode, function calling, and streaming. WebLLM supports a wide range of models including Llama Phi Gemma Mistral Qwen and RedPajama. Users can easily integrate custom models into MLC format and adapt WebLLM to their specific needs and scenarios. The platform allows for plug-and play integration via package managers such as NPM and Yarn or directly through CDN. It also includes comprehensive examples and a module design to connect with UI components. It supports real-time chat completions, which enhance interactive applications such as chatbots and virtual assistances.