Kernel-based tests for joint independence

Research output: Contribution to journalJournal articleResearchpeer-review

Standard

Kernel-based tests for joint independence. / Pfister, Niklas; Bühlmann, Peter; Schölkopf, Bernhard; Peters, Jonas.

In: Journal of the Royal Statistical Society, Series B (Statistical Methodology), Vol. 80, No. 1, 01.01.2018, p. 5-31.

Research output: Contribution to journalJournal articleResearchpeer-review

Harvard

Pfister, N, Bühlmann, P, Schölkopf, B & Peters, J 2018, 'Kernel-based tests for joint independence', Journal of the Royal Statistical Society, Series B (Statistical Methodology), vol. 80, no. 1, pp. 5-31. https://doi.org/10.1111/rssb.12235

APA

Pfister, N., Bühlmann, P., Schölkopf, B., & Peters, J. (2018). Kernel-based tests for joint independence. Journal of the Royal Statistical Society, Series B (Statistical Methodology), 80(1), 5-31. https://doi.org/10.1111/rssb.12235

Vancouver

Pfister N, Bühlmann P, Schölkopf B, Peters J. Kernel-based tests for joint independence. Journal of the Royal Statistical Society, Series B (Statistical Methodology). 2018 Jan 1;80(1):5-31. https://doi.org/10.1111/rssb.12235

Author

Pfister, Niklas ; Bühlmann, Peter ; Schölkopf, Bernhard ; Peters, Jonas. / Kernel-based tests for joint independence. In: Journal of the Royal Statistical Society, Series B (Statistical Methodology). 2018 ; Vol. 80, No. 1. pp. 5-31.

Bibtex

@article{d25bedcaa0df4d9e94d5a7e2c6aa00ad,
title = "Kernel-based tests for joint independence",
abstract = "We investigate the problem of testing whether $d$ random variables, which may or may not be continuous, are jointly (or mutually) independent. Our method builds on ideas of the two variable Hilbert-Schmidt independence criterion (HSIC) but allows for an arbitrary number of variables. We embed the $d$-dimensional joint distribution and the product of the marginals into a reproducing kernel Hilbert space and define the $d$-variable Hilbert-Schmidt independence criterion (dHSIC) as the squared distance between the embeddings. In the population case, the value of dHSIC is zero if and only if the $d$ variables are jointly independent, as long as the kernel is characteristic. Based on an empirical estimate of dHSIC, we define three different non-parametric hypothesis tests: a permutation test, a bootstrap test and a test based on a Gamma approximation. We prove that the permutation test achieves the significance level and that the bootstrap test achieves pointwise asymptotic significance level as well as pointwise asymptotic consistency (i.e., it is able to detect any type of fixed dependence in the large sample limit). The Gamma approximation does not come with these guarantees; however, it is computationally very fast and for small $d$, it performs well in practice. Finally, we apply the test to a problem in causal discovery.",
keywords = "Causal inference, Independence test, Kernel methods, V-statistics",
author = "Niklas Pfister and Peter B{\"u}hlmann and Bernhard Sch{\"o}lkopf and Jonas Peters",
year = "2018",
month = jan,
day = "1",
doi = "10.1111/rssb.12235",
language = "English",
volume = "80",
pages = "5--31",
journal = "Journal of the Royal Statistical Society, Series B (Statistical Methodology)",
issn = "1369-7412",
publisher = "Wiley",
number = "1",

}

RIS

TY - JOUR

T1 - Kernel-based tests for joint independence

AU - Pfister, Niklas

AU - Bühlmann, Peter

AU - Schölkopf, Bernhard

AU - Peters, Jonas

PY - 2018/1/1

Y1 - 2018/1/1

N2 - We investigate the problem of testing whether $d$ random variables, which may or may not be continuous, are jointly (or mutually) independent. Our method builds on ideas of the two variable Hilbert-Schmidt independence criterion (HSIC) but allows for an arbitrary number of variables. We embed the $d$-dimensional joint distribution and the product of the marginals into a reproducing kernel Hilbert space and define the $d$-variable Hilbert-Schmidt independence criterion (dHSIC) as the squared distance between the embeddings. In the population case, the value of dHSIC is zero if and only if the $d$ variables are jointly independent, as long as the kernel is characteristic. Based on an empirical estimate of dHSIC, we define three different non-parametric hypothesis tests: a permutation test, a bootstrap test and a test based on a Gamma approximation. We prove that the permutation test achieves the significance level and that the bootstrap test achieves pointwise asymptotic significance level as well as pointwise asymptotic consistency (i.e., it is able to detect any type of fixed dependence in the large sample limit). The Gamma approximation does not come with these guarantees; however, it is computationally very fast and for small $d$, it performs well in practice. Finally, we apply the test to a problem in causal discovery.

AB - We investigate the problem of testing whether $d$ random variables, which may or may not be continuous, are jointly (or mutually) independent. Our method builds on ideas of the two variable Hilbert-Schmidt independence criterion (HSIC) but allows for an arbitrary number of variables. We embed the $d$-dimensional joint distribution and the product of the marginals into a reproducing kernel Hilbert space and define the $d$-variable Hilbert-Schmidt independence criterion (dHSIC) as the squared distance between the embeddings. In the population case, the value of dHSIC is zero if and only if the $d$ variables are jointly independent, as long as the kernel is characteristic. Based on an empirical estimate of dHSIC, we define three different non-parametric hypothesis tests: a permutation test, a bootstrap test and a test based on a Gamma approximation. We prove that the permutation test achieves the significance level and that the bootstrap test achieves pointwise asymptotic significance level as well as pointwise asymptotic consistency (i.e., it is able to detect any type of fixed dependence in the large sample limit). The Gamma approximation does not come with these guarantees; however, it is computationally very fast and for small $d$, it performs well in practice. Finally, we apply the test to a problem in causal discovery.

KW - Causal inference

KW - Independence test

KW - Kernel methods

KW - V-statistics

U2 - 10.1111/rssb.12235

DO - 10.1111/rssb.12235

M3 - Journal article

VL - 80

SP - 5

EP - 31

JO - Journal of the Royal Statistical Society, Series B (Statistical Methodology)

JF - Journal of the Royal Statistical Society, Series B (Statistical Methodology)

SN - 1369-7412

IS - 1

ER -

ID: 188871088