A case study in the application of tree based methods for modeling claims frequencies

Specialeforsvar ved Michala Koch Heerup

Titel: A case study in the application of tree based methods for modeling claims frequencies

Abstract: The aim of this thesis is to investigate the theory and application of tree based models on real insurance data for the purpose of modeling claims frequencies. An important a priori assumption is that the number of claims are Poisson dis- tributed. The data is provided by a Danish insurance company and contains information about a bicycle insurance over the insurance years 2011-2016. For the analysis we will fit models from three families of tree based models – sin- gle regression trees, random forests and tree boosters and compare these to a benchmark generalized linear model. In order to do so we introduce a variety of visual aids for interpreting the models, and draw attention to the concepts of overfitting, cross validation and parameter tuning. All models will be fitted to data from the first five insurance years, and they will be evaluated on how well they predict the data from 2016. The models will be assessed on several perfor- mance measures, and finally we will consider if tree based models could replace the generalized linear model in practice. The boosters turned out to outper- form the other models, but they are dicult to control. We suggest alternative scenarios where the tree based models might be more useful in practice

 

Vejleder: Jostein Paulsen
Censor: Mette M. Havning