Seminar in applied mathematics and statistics

SPEAKER: Nicola Gnecco (University of Geneva)

TITLE: Extremal Random Forest

ABSTRACT: Methods from statistics and machine learning for quantile regression fail when the quantile of interest is so extreme that only a few or no training data points exceed it. To overcome this problem, asymptotic results from extreme value theory can be used to extrapolate beyond the range of the data, and several approaches exist that use linear regression, kernel methods or generalized additive models. These methods break down if the predictor space has more than a few dimensions or if the regression function of extreme quantiles is complex. We propose a method for extreme quantile regression that combines the flexibility of random forests with the theory of extrapolation. Our extremal random forest (ERF) estimates the parameters of a generalized Pareto distribution, conditional on the predictor vector, by maximizing a weighted likelihood, where the localizing weights are extracted from a quantile random forest. We further penalize the shape parameter in this likelihood to regularize its variability in the predictor space. A range of simulation setups shows that our ERF outperforms both the classical machine learning methods for quantile regression and the existing regression approaches from extreme value theory.