What Diffusion Models Can Teach Us About LLMs

Jun 04, 2023

The image diffusion model ecosystem evolved quickly due to license-friendly and open-source Stable Diffusion models. Now, with LLaMa, Vicuna, Alpaca, RedPajama, Falcon, and many more open-source LLMs, the text-generation LLMs are evolving nearly as quickly. Developer tools, infrastructure, and other techniques that might eventually come to text-generation LLMs that originated with diffusion models.

LoRA — Low-Rank Adaptation of Large Language Models quickly became the standard to extend the base Stable Diffusion models. They became extremely popular for a few reasons:

Much smaller file size
Faster to fine-tune (more in a hacker’s guide to LLM optimization)

With QLoRA, we’re getting closer to this reality for text-generation LLMs.

Prompt Matrix — Used to test different parameters for image generation. You might test with CFG Scale at a few different values on the X-axis and use step values on the Y-axis.

This is starting to happen, except with parameters like temperature on the X-axis and different models on the Y-axis. Or different prompts tested across different models. Why now? Enough models to want to test, and cheap and quick enough to reasonably test multiple models.

Prompt Modifiers / Attention — Using () in the prompt increases the model’s attention to words, and [] decreases it. You can also add numeric modifiers, e.g., (word:1.5). There’s no direct comparison, but logit bias is a way to steer LLMs towards a particular result. See ReLLM and ParserLLM.

Negative Prompts — LLMs don’t entirely support negative prompts (like in Stable Diffusion). One way to achieve a similar result is through logit bias again.

Loopback — Automatically feed output images as input in the next batch. This is somewhat equivalent to how we’re starting to think about agents in LLMs.

Checkpoint Merger — There are utilities to merge checkpoints from different models. For example, blend styles, apply multiple LoRAs, and more. However, we haven’t seen this as much in the text-generation models (other than applying the LoRA weights). I’m unsure how well it works, but it's something to look into.

2 Comments

Roibín O’Toole

Reactive Rants

Jun 5, 2023Liked by Matt Rickard

I have been enjoying your articles on Substack. As a developer, I’ve always been fascinated by AI, but I often felt it was too complex for me to fully understand. Thanks for sharing your expertise in such an approachable way.

Expand full comment

Nadav Geva

Jun 4, 2023Liked by Matt Rickard

Excellent write-up, thanks for the hard work!