Matt Rickard

Share this post

A Hacker's Guide to LLM Optimization

blog.matt-rickard.com

Discover more from Matt Rickard

Thoughts on engineering, startups, and AI.
Continue reading
Sign in

A Hacker's Guide to LLM Optimization

Mar 29, 2023
8
Share this post

A Hacker's Guide to LLM Optimization

blog.matt-rickard.com
Share

A bag of tricks to increase either training or inference latency or memory and storage requirements for large language models.

Compress the model

  • Quantization (post-training) — Normalize and round the weights. No retraining is needed. 

  • Mixed precision —  Using a combination of lower (e.g., float16) and higher (e.g., float32) precision arithmetic to balance performance and accuracy.

Fewer computations:

  • LoRa (Low-Rank Adaptation of Large Language Models) — A method to reduce the model size and computational requirements by approximating large matrices using low-rank decomposition. Faster fine-tuning, and you can share the LoRa weights only (orders of magnitude smaller than a fine-tuned model). Used often in Stable Diffusion.

Prune the model

  • Structured pruning uses different algorithms to determine what weights can be ignored at inference time. For example, SparseGPT claims their algorithm can prune models by 50% without retraining. 

Restrict the domain (fine-tune a smaller model)

  • Task-specific fine-tuning —  Retraining a large model on a smaller dataset specific to the target task, reducing its complexity and size.

  • Model arbitrage —  Generate targeted training data from a larger model to train a specific smaller model.

Dispatch to multiple small models

  • Model ensembles — Combining the outputs of multiple smaller models, each specialized in a sub-task, to improve overall performance. Might use a similarity search on embeddings or some other heuristic to figure out what models to call.

Cache the inputs

  • Cache repeated responses

  • Cache semantically similar inputs

  • Precompute common queries

8
Share this post

A Hacker's Guide to LLM Optimization

blog.matt-rickard.com
Share
Previous
Next
Comments
Top
New
Community

No posts

Ready for more?

© 2023 Matt Rickard
Privacy ∙ Terms ∙ Collection notice
Start WritingGet the app
Substack is the home for great writing