Optimizing and implementing mashine learning techniques for detecting insurance fraud: An analysis using real data

Specialeforsvar ved Andreas Korsbæk

Titel:  Optimizing and implementing machine learning techniques for detecting insurance fraud: An analysis using real data





In this thesis we will apply various machine learning algorithms to real insurance data from Codan Forsikring A/S. We explore different optimization strategies proposed by investigators within the company. Focus will be on building fraud detection models that can be deployed in practice. We code a framework which handles feature generation, missing values, and allows for model deployment. This includes functionality to explain each individual claim classification, thus eliminating the black box aspect of complex machine learning models. The overall goal is also to find the best model to use. As a baseline we use logistic regression but later find that XGBoost outperforms this. We define and oblige by key validation concepts to avoid biased results. The model's performance is estimated by the use of an independent test set. On this, the model correctly and reliably classifies fraud claims more than 30 % of the time.




Vejleder: Jostein Paulsen
Censor:   Mette M Havning