We could divide a Data Science project into three big phases: data managing, model training and analysis. Data managing has to do with cleaning the data, engineering features and managing databases. Model training requires Machine Learning to create a model of the data. Finally, there has to be an analysis of the results, showing the predictions of the model in an understandable way to the customer, usually performed via reports or dashboards.
Regarding Machine Learning, people use different algorithms depending on the problem. In the industry, probably the most used algorithm in Data Science is Decision Trees. Curiously, until 2006, the most popular Machine Learning algorithm in the scientific community was Support Vector Machines. But 2012 brought one of the most successful breakthroughs in Artificial Intelligence: Deep Learning, improving previous performances by more than 20%. Since then, a lot of researchers turned to Deep Neural Networks. However, it doesn’t happen the same in the industry. Data Scientists keep using Decision Trees or Logistic Regression in their day to day problems.
Here some of the key use cases:
There is a big business opportunity around Deep Learning and it will come to Data Science very soon. Furthermore, there is a wide suite of open-source libraries: CNTK from Microsoft, TensorFlow from Google, Keras, Chainer, Theano, and MXNet.