Matt Rickard

Share this post

Text to Image Diffusion Models

blog.matt-rickard.com

Discover more from Matt Rickard

Thoughts on engineering, startups, and AI.
Continue reading
Sign in

Text to Image Diffusion Models

May 25, 2022
Share this post

Text to Image Diffusion Models

blog.matt-rickard.com
Share

Researchers have built text-to-image models to generate photorealistic images from only a text prompt. And they look very convincing.

From Google's Imagen
From DALL-E

The first model released was DALL-E by Open AI - a 12-billion parameter version of GPT-3. Google quickly followed up with their own model, Imagen by Google Research, which they claim tested better among human reviewers than comparable models.

They are both diffusion models. Diffusion models work by progressively adding noise to the training data until it is all noise. Then, it attempts to reverse the process, adding details until it can reproduce a noise-less sample. You can read a more in-depth summary of the class of models on Google's AI Blog.

The research findings from the diffusion models are interesting

  • Uncurated user-generated data from the web continues to be useful for a wide variety of models

  • Increasing the text-only language model is more effective than increasing the image model, i.e., more text data goes a long way in training the model (better text-image alignment, better images).

  • More parameters, better model (even at an enormous scale)

These models are exciting, and it will be interesting to see what use cases people come up with. Much like how AI-powered copyrighting didn't displace marketers, text-to-image models will be an asset and tool to creatives. I imagine just-in-time illustrations for books and engaging illustrations for almost every website.

Share this post

Text to Image Diffusion Models

blog.matt-rickard.com
Share
Previous
Next
Comments
Top
New
Community

No posts

Ready for more?

© 2023 Matt Rickard
Privacy ∙ Terms ∙ Collection notice
Start WritingGet the app
Substack is the home for great writing