Modeling Policyholder Retention Using Machine Learning Techniques

Specialeforsvar ved Joakim Puggaard Pagels

Titel: Modeling Policyholder Retention Using Machine Learning Techniques 

Abstract: Retention is an important factor that impacts both profit and growth of insurance compa- nies. This thesis applies several machine learning methods to four years of auto insurance data from Købstædernes Forsikring with the purpose of predicting policyholder retention. Insurance retention models are often presented under the generalized linear model (GLM) framework due to the popularity of GLMs in insurance. This thesis investigates different machine learning algorithms, such as tree-based methods and several boosting methods, and explores how they perform in comparison to the classical logistic regression. Techniques for data preprocessing are explored and performed. Data preprocessing tech- niques generally refer to the addition, deletion, and transformation of the data. We intro- duce the general theory of statistical learning along with several machine learning algo- rithms. Multiple metrics are available to evaluate classification models. We present a vast set of measures to get a comprehensive and detailed assessment of the model performance. The results of this study show that the complex non-linear methods perform better than the classical logistic regression. XGBoost is selected as the final model. Nonetheless, methods as GBM, AdaBoost and random forest also showed potential. Finally, several applications of practical implementations for the final model are proposed

Vejleder: Jostein Paulsen
Censor:   Mette Havning