Updates from FFL on new papers, articles, and exciting developments
View this email in browser

A New Look: PyTorch and Tensorflow

by Seth

There are a lot of deep learning frameworks.

There’s a lot. Image credit: Andrej Karpathy.

Declaring “winners” or making absolute statements of superiority is futile in this context, but it is clear that two of these frameworks are set apart from the rest in terms of popularity. TensorFlow (from Google) and PyTorch (from Facebook) are the dominant deep learning frameworks (note that Caffe is now part of PyTorch and Keras is a high-level API for Tensorflow) and they differ rather starkly from one another. Or rather - they used to. With the announcements this year of PyTorch 1.0 and TensorFlow 2.0, the two frameworks are starting to look more alike.

Static vs. Dynamic

The differences between these two frameworks is most evident from the model-building experience. Building a model in TensorFlow requires defining a computational graph. You need to completely define the inputs, outputs, loss functions, and model computations before you can actually use the model or see any results. This method is aptly named “define-then-run.” In this way, programming in TensorFlow is more like programming in a domain specific language (DSL). Because you can’t see intermediate results, it can be difficult to debug and unintuitive to use.

In [2]: dataset = load_iris()
   ...: X = dataset.data[:, :3]
   ...: y = dataset.data[:, 3:4]

In [3]: Xt = tf.placeholder(tf.float32, [None, 3])
   ...: yt = tf.placeholder(tf.float32, [None, 1])
   ...: W = tf.Variable(tf.ones([3, 1]))

In [4]: Xt
Out[4]: <tf.Tensor 'Placeholder:0' shape=(?, 3) dtype=float32>

In [5]: yhat = tf.matmul(Xt, W)
   ...: cost = tf.reduce_mean(tf.square(yhat - yt))  # cost represents a node in the graph, not a tensor itself

In [6]: yhat
Out[6]: <tf.Tensor 'MatMul:0' shape=(?, 1) dtype=float32>

In [7]: cost
Out[7]: <tf.Tensor 'Mean:0' shape=() dtype=float32>

In [8]: init = tf.global_variables_initializer()
   ...: sess = tf.Session()
   ...: sess.run(init)
   ...: sess.run(cost, feed_dict={Xt: X, yt: y})
Out[8]: 134.29327

Building a model in PyTorch is much more like traditional programming, since it uses an imperative style. That is, you see the results as you execute the code.

In [2]: dataset = load_iris()
   ...: X = dataset.data[:, :3]
   ...: y = dataset.data[:, 3:4]

In [3]: Xt = torch.from_numpy(X).float()
   ...: yt = torch.from_numpy(y).float()
   ...: W = torch.randn(3, 1)

In [4]: Xt
Out[4]:
tensor([[5.1000, 3.5000, 1.4000],
        [4.9000, 3.0000, 1.4000],
        ...
        [5.9000, 3.0000, 5.1000]])

In [5]: yhat = torch.mm(Xt, W)

In [6]: yhat
Out[6]:
tensor([[3.2658],
        [1.1384],
        [1.1479],
        ...
        [1.7955]])

In [7]: cost = torch.mean((y - yhat)**2)

In [8]: cost
Out[8]: tensor(3.1945, dtype=torch.float64)

Here, the graph of computations that define the model is built dynamically, as computations are run. Hence, we call this approach “define-by-run.” This imperative style makes it easier to debug PyTorch programs and inspect intermediate results, and isn’t as awkward to learn. But this usability comes at a cost.

With static computational graph models as in TensorFlow, the model runtime gets a complete view of the model and can perform various optimizations that make the model more efficient. Additionally, the model can be handed off to the execution engine written in a lower level language like C++ where it can operate without involving the Python interpreter. With dynamic computational graphs, many optimizations become impossible (for example, intermediate results must be stored so their memory cannot be re-used).

Because of these differences there is a perception that PyTorch is only good for research and experimentation and that TensorFlow is difficult to use but necessary for speed and productionization. Both communities are hard at work, trying to change these narratives.

PyTorch 1.0 and TensorFlow 2.0

The PyTorch 1.0 release is all about improving the production workflow in PyTorch. First, the deep learning library Caffe2, which had a production focus from its inception, has been merged into the PyTorch project. Using the ONNX model serialization framework, it will be easy to train a model in PyTorch, export it via ONNX, and import it into Caffe2 for productionizaton. Second, PyTorch developers have introduced a tracing JIT compiler that makes it easy to turn your PyTorch models into, yes, static computation graphs, à la TensorFlow. Third, a new library version of PyTorch called libtorch makes it possible to use PyTorch functionality without any dependence on Python and can be linked directly into C++ applications.

The TensorFlow 2.0 release is all about making TensorFlow more user-friendly and easier for experimentation. TensorFlow will now support eager execution as a first-class citizen with the goal of making programming in TensorFlow feel just like normal imperative programming. There will be continued investment in the Keras API which provides a cleaner, more Pythonic programming experience and a concerted effort to reduce clutter and provide a simplified package structure.

All this is to say that the PyTorch and TensorFlow communities are attempting to fill the gaps where the other has traditionally excelled. This is great news for both user bases and machine learning practitioners in general. For now, PyTorch will still be a more user-friendly option for research and TensorFlow will still be better for production, but we expect those gaps to close soon enough.

In fact, these two are not the only frameworks that are converging. Google software engineer and Keras creator Francois Chollet recently shared an enlightening tweet highlighting the similarities between four popular deep learning frameworks.

Image credit: Francois Chollet.

This convergence is not surprising. Deep learning has now reached a level of maturity where we know what works well and what doesn’t, and that the needs of practitioners are diverse, spanning the spectrum from research to production. In the end, what practitioners need are tools that maximize productivity and facilitate understanding of the application of machine learning, not a single tool which is universally “best.”


We recommend the following :


CFFL Updates

As always, thank you for reading! We welcome your thoughts and feedback; please reach out anytime.

All the best,

Cloudera Fast Forward Labs