WebLLM: A high-performance in-browser LLM Inference engine
In tis talk, Charlie Ruan from MLC will focus on WebLLM, a high-performance in-browser LLM inference engine. WebLLM allows building AI-enabled web apps that are fast (native GPU acceleration via WebGPU), private (100% client-side computation), and convenient (zero environment setup). For developers, WebLLM features an OpenAI-API style interface for standardized integration, supports chat applications and efficient structured JSON generation, and offers built-in support for Web/Service Workers to separate backend executions from the UI flow. In this talk, we will explore WebLLM’s key features, overall architecture, and how developers can build AI-enabled web applications with it.
Try Web LLM → https://goo.gle/3YluAr9
See more Web AI talks → https://goo.gle/web-ai
Subscribe to Chrome for Developers → https://goo.gle/ChromeDevs
Speaker: Charlie Ruan
Products mentioned: AI for the web, Google Chrome Browser, Chrome Browser Automation, Chrome Extensions, Chrome, Chrome Web Platform, Web AI, Web apps, Web Assembly (Wasm), Web Platform in Chrome, WebAssembly for Chrome, WebGPU, CodeGemma, Gemma 2, Gemma, RecurrentGemma, Generative AI, AI, Google AI, Google AI Edge, Responsible AI, Kaggle Models, LiteRT, TensorFlow, Hugging Face Models
Chrome for Developers
Making the web more awesome....