The frameworks I want to analyze are Tensorflow, Keras, PyTorch, MXNet, and CNTK. I have used all of them and contributed to MXNet and CNTK.
In terms of adoption, the current ranking is PyTorch > Tensorflow > Keras > MXNet > CNTK. In the next image obtained from this website, we show the percentage of PyTorch papers over Tensorflow.
Tensorflow
Tensorflow was initially released by Google Brain in 2015 and all the company uses it.
In terms of adoption, Google has produced some of the most impactful deep learning advances recently, like BERT or AlphaZero, so a lot of research labs and companies use Tensorflow. Google was a pioneer in open source deep learning, so it had the first mover advantage.
The main weakness of Tensorflow is its complexity. It was designed by researchers for researchers, it wasn’t a tool for the general machine learning community. In version 2, Google refactored much of the API, getting rid of things like the data placeholders and simplifying the code. These details and some security issues reduced its adoption.
Keras
Keras was released in 2015 by François Chollet as a high-level interface for Tensorflow. It was designed as a user-friendly, modular, and extensible alternative to Tensorflow. After Chollet joined Google, Keras became part of Tensorflow.
In terms of adoption, Keras gained a lot of popularity quickly because its learning curve was much easier. It particularly attracted a lot of interest in the industry.
The main weakness of Keras was that it took them a while to implement some important features like multi-gpu training, distributed training, and other features. So if you wanted to create complex solutions, you still needed to use Tensorflow.
PyTorch
PyTorch was published in 2016 by Facebook Research (FAIR). A previous project called Torch, written in the programming language Lua, didn’t have much adoption, and FAIR decided to pivot and focus on Python.
In terms of adoption, PyTorch has gained a lot of momentum since 2018 and has become the most used deep learning framework in the research community. PyTorch was designed by developers that clearly mastered the Python language and know good software engineering practices. In addition to its simplicity, PyTorch also had the push by a top-class research group, which published all new research in PyTorch. Other companies like Microsoft or Huggingface started to support it creating a lot of adoption.
Its main weakness was that PyTorch started later than Tensorflow, so initially, there were not that many algorithms implemented in PyTorch.
MXNet
MXNet was started by Tianqi Chen from the University of Washington, who is also the creator of XGBoost. Even though it initially had the support of Microsoft (yours truly included), eventually Amazon became its main supporter. Amazon wanted to replicate the strategy of Google with Tensorflow and have its own deep learning framework.
In terms of adoption, MXNet gained interest because it was very easy to do multi-GPU training. Tianqi had a lot of experience in distributed systems, so MXNet natively works in multi-GPU. It also has a great user experience.
Its main weakness was that MXNet didn’t have much support for new research, so no new ML breakthroughs were developed initially in MXNet. Instead, MXNet reimplemented other algorithms developed in Tensorflow or PyTorch. Several years ago, Amazon introduced Gluon, a high-level API for MXNet in an attempt to increase its adoption. I think this was a mistake because the problem with MXNet was not that it was difficult to use. Instead of losing their time building Gluon, they could have doubled down on the strengths of MXNet, like its distributed capability.
CNTK
CNTK was released by Microsoft in April 2015. Initially, it used a scripting language called BrainScript, and then in October 2016, with its version 2, it supported Python and C++. In 2019, Microsoft decided to move away from CNTK and use PyTorch as its main deep learning framework, a decision that I supported.
In terms of adoption, CNTK got an initial push from Microsoft due to its efficiency in distributed training.
In terms of weaknesses, it was designed following the patterns of Tensorflow, which initially people thought was going to win the deep learning race, so the Python wrapper looked a lot like C++. At Microsoft, we are not forced to use a specific framework like at other companies, so not everyone used CNTK. The lack of adoption and its lack of support by the research community made Microsoft pivot and focus on PyTorch.
What framework to use
Based on the previous analysis, here is my advice on what framework you could use in order of my personal preference.
My first option is PyTorch. PyTorch has a strong support from the research community and it is very easy to use. If you want to do distributed training, mixing it with DeepSpeed is really difficult to beat.
As a second option, I recommend using whatever framework the algorithm is implemented in. Basically, if you find a working piece of code, just use it.
My third option is MXNet, because of how easy it is to use, and do multi-GPU training.
Next, I would use Keras, then Tensorflow, and the last one would be CNTK.