Explainability of tree ensemble predictions

Specialeforsvar: Alberte Winther Remfeldt

Titel:  Explainability of tree ensemble predictions


Abstract:  In this thesis we consider the ability to explain tree ensemble predictions, while still maintaining high predictive performance. More specifically we use the TreeExplainer algorithm to calculate SHAP values for a Random Forest and compare these to the components of a Random Planted Forest, a directly interpretable tree ensemble. We find that in terms of global interpretability of a feature’s effect on the model predictions, the Random Planted Forest has an advantage.
The structure of a Random Planted Forest allows it to fully separate main effects and interaction effects of all orders, where the SHAP main and second order interaction values suffer from implicit inclusion of higher order interactions. The Random Planted Forest further manages to exhibit higher predictive performance than the Random Forest on test data, but only when allowing interactions. The separation of main and interaction effects by the Random Planted Forest is however shown to make explanations of single prediction more difficult as many values play a role.
The SHAP values of the Random Forest has the advantage here, as they can be collapsed into one value per feature.


Vejleder:  Niels Richard Hansen
Censor:    Niels Wæver Hartvig, Novo Nordisk