Matt Rickard

Share this post

Mixture of Experts: Is GPT-4 Just Eight Smaller Models?

blog.matt-rickard.com

Discover more from Matt Rickard

Thoughts on engineering, startups, and AI.
Continue reading
Sign in

Mixture of Experts: Is GPT-4 Just Eight Smaller Models?

Jun 21, 2023
4
Share this post

Mixture of Experts: Is GPT-4 Just Eight Smaller Models?

blog.matt-rickard.com
2
Share

In a recent interview, George Hotz claimed that GPT-4 is just an eight-way mixture model of 220B parameters. It could be a Mixture of Experts (MoE) model. That estimates GPT-4 at about 1.2 trillion parameters (8 x 220 billion).

Models often reuse the same parameters for all inputs. But Mixture of Experts models uses different parameters based on the example. You end up with a sparsely activated ensemble. 

Routing multiple models isn’t easy. There’s overhead in communication between the models and routing between them. Switch Transformers (by researchers at Google), a gating network (typically a neural network), produces a sparse distribution over the available experts. It might only choose the top-k highest-scoring experts, or softmax gating, which encourages the network to select only a few experts.

Getting the balance right is still tricky — ensuring that specific experts aren’t chosen too often.

Some other interesting facts and implications:

  • GPT-4 costs about 10x the cost of GPT-3. 

  • GLaM is Google’s 1.2T model with 64 experts.

  • There’s also some analysis on unified scaling laws for routed language models. 

  • Ensemble networks were some of the most powerful models in the neural network deployment era. Maybe that’s still the case. 

4
Share this post

Mixture of Experts: Is GPT-4 Just Eight Smaller Models?

blog.matt-rickard.com
2
Share
Previous
Next
2 Comments
Share this discussion

Mixture of Experts: Is GPT-4 Just Eight Smaller Models?

blog.matt-rickard.com
swyx
Writes Latent Space
Jun 21Liked by Matt Rickard

thanks for the callout! would appreciate [signal boost](https://twitter.com/swyx/status/1671183813190504448) on the source podcast :)

Expand full comment
Reply
Share
1 reply by Matt Rickard
1 more comment...
Top
New
Community

No posts

Ready for more?

© 2023 Matt Rickard
Privacy ∙ Terms ∙ Collection notice
Start WritingGet the app
Substack is the home for great writing