Implementing LLMs in the Browser
LLMs are coming to the browser. While it’s still really slow, running these computations on clientside is much cheaper. And the browser is the ultimate delivery mechanism — no downloading packages, setting up a programming environment, or getting an API key. But, of course, they won’t be used clientside for everything — initially just testing, playgrounds, and freemium getting-started experiences for products.
There are generally two strategies for getting LLMs working in the browser:
Compile C/C++ or Rust to WebAssembly. Take a fairly vanilla library like ggml and use emscripten to convert it to WebAssembly (Wasm fork of ggml, WasmGPT). Optionally, target the new WebGPU runtime like WebLLM.
Now, combine a WebAssembly LLM in the browser with a WebAssembly Python interpreter in the browser, and you might get some interesting applications that are sandboxed by default.
WebGPU will ship on May 2nd in Chrome. WebGPU exposes more advanced GPU features and general computation primitives (unlike WebGL).