Miguel González-Fierro

June 5, 2016

Before starting to address the subject of Machine Learning, it is essential to understand first what is mathematics and why they are so important. Math is the language of science. It allows scientist and engineers to express the Universe and the phenomenons that happen inside it.

As any language, it has tenses: past, present and future, so we can explain what happened in the past, we can describe what is happening in the present, and we can predict what will happen in the future. The most useful tense in math is the future, i.e. the prediction of events.

We are very interested in predicting the future, or knowing how a system will evolve over time starting from an initial state. We can predict the future trajectory of a meteorite to know if it will hit the Earth (so we can call Bruce Willis) or we can predict the sales of a product in Amazon (so we can make Jeff Bezzos happy). Anyhow, predicting is important. and Machine Learning is a really good way of predicting things.

In order to play with math we need numbers and variables (which are just letters that represent different values). These numbers and variables create the most important mathematical resource to talk about the Universe, the model. A model is just an equation that represents a concept or event in the mathematical language. Some famous models are $x^2+y^2=r^2$, which means a circle of radius $r$; $F=ma$, also known as Newton's Second Law of Motion, it states that the force, $F $, is proportional to the acceleration, $a$, in a quantity called mass, $m$; or the famous Einstein equation $E=mc^2$, which states that all mass, $m$, has an amount of energy, $E$, proportional to the square of the speed of light, $c$.

There are two kind of models, analytical models, like the ones previously mentioned and learned models. Until the computer era, most scientists used analytical models, mainly because in order to develop learned models it is necessary to compute many operations. Other significant difference between analytical and learned models is that, usually, analytical models are derived using a mathematical demonstration, which are a set of steps that reach a final equation. On the contrary, learned models are obtained via a process called training or optimization, and need many examples of inputs-outputs to obtain the equation. For this reason, analytic models tend to be compact and beautifull equations, while learned models tend to have hundreds of parameters, generating a very complex and large equation.

Machine Learning is a technique that allows to obtain learned models to generate predictions. It has a huge advantage in comparison with analytic models, you don't need to derive a formula, therefore you don't really need to have a deep understanding of math.

The only thing you need to create a machine learning model is a group of inputs that generate a group of outputs. The model is just the relationship between inputs and outputs.

In the image, the inputs correspond to the points $(x,y)$, the ouputs are the color red and blue, meaning that the inputs are divided in two different classes, and the model is the line that divides both classes. This specific problem is called classification.

The model we obtained follow this equation, $y=wx+b$, where $w$, is the slope, and $b$ is the bias, they are both numbers. In this case the equation is: $y=-1.3x+1.6$

Once we have the model, we can make predictions. An example is, given a new input point $(x,y)=(0.2,0.3)$, what would be the output? Since the point is below the line, then the output would be the class blue.

The last key concept to understand is the concept of optimization, which in the machine learning nomenclature is called training.

To explain the concept of optimization let's imagine a football player taking a penalty like in the next image. The player hits the ball with the intention that it follows a trajectory that ends up in goal, but many times the ball doesn't follow the trajectory the player desired. So there are two trajectories, the desired one and the actual one.

Optimization is the process of tuning the parameters of the model to make the desired and the actual output as similar as possible. The parameters of the model, in the case of the previous equation, are $w$ and $b$. The difference between the desired and actual output is called the error. Therefore, optimization is equivalent to minimize the error.

The training process of a machine learning algorithm is the optimization of the parameter's model so the desired output (which is the output we know from the data), and the actual output (which is the output predicted by the model equation) are as similar as possible. This process is simulated in the first image, when the line is moving. Initially the line doesn't divide the classes very well, there is a high error. Later, as the parameters change, the line divides the classes perfectly, reaching a minimal error.

There are many ways to train a machine learning model, such as gradient descent or genetic algorithms, and there are many different algorithms, such as support vector machines, neural networks or random forests. But they all follow the same basic principles addressed in this post. All this compose the big world of machine learning algorithms.

blog comments powered by Disqus