Matt Rickard

Share this post

The Truth About GPU Utilization

blog.matt-rickard.com

Discover more from Matt Rickard

Thoughts on engineering, startups, and AI.
Continue reading
Sign in

The Truth About GPU Utilization

Aug 19, 2023
8
Share this post

The Truth About GPU Utilization

blog.matt-rickard.com
4
Share

If I am trying to sway others, I would say that an org that has only known inefficiency is ill prepared for the inevitable competition and/or belt tightening, but really, it is the more personal pain of seeing a 5% GPU utilization number in production. I am offended by it. — John Carmack’s resignation letter from Meta

The truth is that GPU, CPU, RAM, and every other compute resource is probably at less than 50% utilization in any organization. There are plenty of exceptions — training jobs, for example — but this is the norm. 

Supply is not elastic. GPUs can't be procured out of thin air for companies that run their own hardware (e.g., Meta). It takes time to build and deploy data centers and hardware.

Scaling latency. Even in cloud environments, it’s tough to scale 1-1 with demand. Even the best predictive and optimized algorithms can’t match the demand curve one-to-one.

Underprovisioning breaks workloads. Out-of-memory errors are notoriously hard to track down. They can seemingly come out of nowhere. Working but unoptimized code can bring down production in mysterious ways.

Organizational constraints. Resources are hard to share equally. Some teams might have more administrative power in acquiring (and protecting) resources. The idea of an internal resource economy has been tried (there was one at Google), but it almost always devolves into a power struggle.

Software constraints. Not all software can fully utilize the hardware. Think of bin-packing. Even with the best algorithms, there might not be enough right-sized workloads to fit into predetermined hardware boxes. 

8
Share this post

The Truth About GPU Utilization

blog.matt-rickard.com
4
Share
Previous
Next
4 Comments
Share this discussion

The Truth About GPU Utilization

blog.matt-rickard.com
kristina
Aug 19Liked by Matt Rickard

Further compounded by GPUs not being great at being shared between users. NVIDIA has a couple of GPUs that can be shared between non-cooperative workloads, but it's pretty limited (especially compared to how good CPUs have gotten at that).

Expand full comment
Reply
Share
1 reply by Matt Rickard
Andrew Smith
Writes Goatfury Writes
Aug 19

Whoa, I had no idea Carmack had taken on a stint at META. That guy is a living legend.

Expand full comment
Reply
Share
1 reply by Matt Rickard
2 more comments...
Top
New
Community

No posts

Ready for more?

© 2023 Matt Rickard
Privacy ∙ Terms ∙ Collection notice
Start WritingGet the app
Substack is the home for great writing