If you want to skip directly to the code, you can find it here.
Fraud Detection
According to a report published by Nilson, in 2017 the worldwide losses in card fraud-related cases reached 22.8 billion dollars. The problem is forecasted to get worse in the following years, by 2021, the card fraud bill is expected to be 32.96 billion dollars.
In the python notebook we created, we use the credit card fraud detection dataset from Kaggle, to identify fraud cases. We use a gradient boosted tree as a machine learning algorithm, based on the LightGBM library, which has recently become one of the most popular libraries for top participants in Kaggle competitions. There are CPU and GPU versions of the system.
Fraud detection problems are known for being extremely imbalanced. Boosting is one technique that usually works well with these kinds of datasets. It iteratively creates weak classifiers (decision trees) weighting the instances to increase the performance. In the first subset, a weak classifier is trained and tested on all the training data, those instances that have bad performance are weighted to appear more in the next data subset. Finally, all the classifiers are ensembled with a weighted average of their estimates.
For operationalizing, we design a RESTful API based on Flask. The input of the API is going to be a transaction (defined by its features), and the output, the model prediction.
Additionally, we designed a websocket service to visualize fraudulent transactions on a map. When a new transaction is sent to the API, the model predicts whether the transaction is fair or fraudulent. If the transaction is fraudulent, the server sends a signal to the web client, that renders a world map showing the location of the fraudulent transaction.