Distance Correlation for Testing Independence with Application to Insurance Data

Specialeforsvar ved Rasmus Bakke Ahlmann

Titel: Distance Correlation for Testing Independence with Application to Insurance Data

 Abstrakt: This Master's Thesis seeks to develop and assert a proposed generalisation of the dependence measure known as distance correlation, which in its original definition was introduced by Székely et al. in 2007. The distance correlation coefficient characterises dependence between two random vectors of arbitrary dimensions and is zero only in the case of independence. Analogously to the Pearson correlation coefficient, the distance correlation coefficient can be seen as a standardised version of distance covariance. The latter is given as a squared L2-distance between the joint and the product of the marginal characteristic functions w.r.t. a measure μ for two random vectors. A proposed generalisation by Davis et al. is associated with the choice of measure μ which decides the properties and conditions of distance correlation. In addition to the original infinite measure μ, which implies moment conditions, we suggest symmetric probability measures as alternative choices of μ. The sample distance dependence measures are averages of functions of the Euclidian distance between all pairs of the sample elements. The existence of a limiting distribution for the normalised sample distance covariance under the assumption of independence, given by a (non-tractable) squared L2-norm of some Gaussian process that depends on the underlying distributions and μ, allows for a suggested test of independence based on bootstrapping methods. Simulation studies show that distance correlation in its original form as introduced by Székely et al., and with the proposed alternative choices of measures μ, performs well for any kind of dependence structure that materialises on the entire support of the underlying distributions. However, given explicit tail-dependence only the original measure μ shows satisfactory empirical power among our examined choices of measures and this infinite measure seems superior to the finite alternatives. An application to insurance data shows that distance correlation in its original definition is well suited for those kinds of problems due to its increased emphasis on the tail-behaviour of the underlying distributions. 

Vejleder:  Thomas Mikosch
Censor:    Søren Asmussen, Aarhus Universitet