The starting point of a Recommendation System is a set of past interactions between users and items. Certain users interacted (clicked/bought/rated) with a small number of items.

We can build an interaction matrix. In the rows, we have all the users, and in the columns, we have all the items. If there has been an interaction between a user-item pair, that matrix cell has a value. Otherwise, it has zero.

The recommendation system's goal is to predict the unknown values in the user-item interaction matrix. In other words, predict the interaction each user will have with each item. This can be achieved with matrix factorization.

Basically, a matrix factorization algorithm decomposes the interaction matrix into two lower dimension matrices that represent the intrinsic user and item attributes:

$$\hat r_{u,i} = q_{i}^{T}p_{u}$$

The challenge to the matrix factorization problem is to find $q_{i}^{T}$ and $p_{u}$. The key is the latent dimension, which is a parameter chosen by the programmer to be much smaller than the number of users or items. The user matrix has the size of the number of users times the latent dimension, and the item matrix has the size of the number of items times the latent dimension.

The learning objective is to minimize the difference between the real interaction matrix and the predicted one. Furthermore, to avoid overfitting issues, the learning process is regularized.

$$\min\sum(r_{u,i} - q_{i}^{T}p_{u})^2 + \lambda(||q_{i}||^2 + ||p_{u}||^2)$$

One of the most well-known methods to solve matrix factorization is Alternating Least Square (ALS) algorithm.

The basic idea of ALS is to learn one of $q_{i}$ and $p_{u}$ at a time for optimization while keeping the other constant. This makes the objective at each iteration convex and solvable. This iterative computation can be parallelized and/or distributed.

In Recommenders repository, you can find an open-source example of ALS in Spark.