TensorFlow vs. PyTorch

Sep 12, 2022

If you take a look at some of the popular machine learning models written in the last few years (YOLOv5, Stable Diffusion), they've been written in PyTorch, not TensorFlow.

I remember when TensorFlow was released in 2015. Kubernetes was released around the same time (part of Google's reasoning for open-sourcing both was to not make the same mistakes they did with Hadoop/Map Reduce – see Diseconomies of Scale at Google). It was a time when many of the deep learning models (Inception, ResNet, other CNNs, and DNNs) were built with TensorFlow, and the industry rallied around the framework. Facebook released PyTorch a year later.

Since then, PyTorch seems to be growing faster than TensorFlow.

Why did PyTorch seem to win?

A more collaborative project – TensorFlow accepts the occasional outside contribution, but development is led internally by Google. External contributors were often blocked by failing internal tests that they couldn't debug.
An imperative vs declarative API. While declarative APIs can sometimes be more optimized and purer, imperative APIs are usually simpler to use.
There's so much more to the model than model design. Arguably, the "hard" part is often all the other things: figuring out training at scale, debugging, and the deployment pipeline.

Why might TensorFlow still win?

Facebook does not design its own chips. Google has TPUs, which can be optimized for TensorFlow (and vice versa). Facebook has joined companies like Microsoft and AMD in a partnership called Onnx to do something similar.
TFLite is still bounds ahead for mobile deployment of models. Google's organizational knowledge of building and operating Android seems to help.

Matt Rickard

Discussion about this post