Anchor regression: Heterogeneous data meet causality

Research output: Contribution to journalJournal articlepeer-review

Standard

Anchor regression : Heterogeneous data meet causality. / Rothenhäusler, Dominik; Meinshausen, Nicolai; Bühlmann, Peter; Peters, Jonas.

In: Journal of the Royal Statistical Society. Series B: Statistical Methodology, Vol. 83, No. 2, 2021, p. 215 - 246.

Research output: Contribution to journalJournal articlepeer-review

Harvard

Rothenhäusler, D, Meinshausen, N, Bühlmann, P & Peters, J 2021, 'Anchor regression: Heterogeneous data meet causality', Journal of the Royal Statistical Society. Series B: Statistical Methodology, vol. 83, no. 2, pp. 215 - 246. https://doi.org/10.1111/rssb.12398

APA

Rothenhäusler, D., Meinshausen, N., Bühlmann, P., & Peters, J. (2021). Anchor regression: Heterogeneous data meet causality. Journal of the Royal Statistical Society. Series B: Statistical Methodology, 83(2), 215 - 246. https://doi.org/10.1111/rssb.12398

Vancouver

Rothenhäusler D, Meinshausen N, Bühlmann P, Peters J. Anchor regression: Heterogeneous data meet causality. Journal of the Royal Statistical Society. Series B: Statistical Methodology. 2021;83(2):215 - 246. https://doi.org/10.1111/rssb.12398

Author

Rothenhäusler, Dominik ; Meinshausen, Nicolai ; Bühlmann, Peter ; Peters, Jonas. / Anchor regression : Heterogeneous data meet causality. In: Journal of the Royal Statistical Society. Series B: Statistical Methodology. 2021 ; Vol. 83, No. 2. pp. 215 - 246.

Bibtex

@article{40601df562d1484ca2f6ce0146a29d03,
title = "Anchor regression: Heterogeneous data meet causality",
abstract = "We consider the problem of predicting a response variable from a set of covariates on a data set that differs in distribution from the training data. Causal parameters are optimal in terms of predictive accuracy if in the new distribution either many variables are affected by interventions or only some variables are affected, but the perturbations are strong. If the training and test distributions differ by a shift, causal parameters might be too conservative to perform well on the above task. This motivates anchor regression, a method that makes use of exogenous variables to solve a relaxation of the {\textquoteleft}causal{\textquoteright} minimax problem by considering a modification of the least-squares loss. The procedure naturally provides an interpolation between the solutions of ordinary least squares (OLS) and two-stage least squares. We prove that the estimator satisfies predictive guarantees in terms of distributional robustness against shifts in a linear class; these guarantees are valid even if the instrumental variable assumptions are violated. If anchor regression and least squares provide the same answer ({\textquoteleft}anchor stability{\textquoteright}), we establish that OLS parameters are invariant under certain distributional changes. Anchor regression is shown empirically to improve replicability and protect against distributional shifts.",
keywords = "causal inference, distributional robustness, replicability, structural equation modelling",
author = "Dominik Rothenh{\"a}usler and Nicolai Meinshausen and Peter B{\"u}hlmann and Jonas Peters",
year = "2021",
doi = "10.1111/rssb.12398",
language = "English",
volume = "83",
pages = "215 -- 246",
journal = "Journal of the Royal Statistical Society, Series B (Statistical Methodology)",
issn = "1369-7412",
publisher = "Wiley",
number = "2",

}

RIS

TY - JOUR

T1 - Anchor regression

T2 - Heterogeneous data meet causality

AU - Rothenhäusler, Dominik

AU - Meinshausen, Nicolai

AU - Bühlmann, Peter

AU - Peters, Jonas

PY - 2021

Y1 - 2021

N2 - We consider the problem of predicting a response variable from a set of covariates on a data set that differs in distribution from the training data. Causal parameters are optimal in terms of predictive accuracy if in the new distribution either many variables are affected by interventions or only some variables are affected, but the perturbations are strong. If the training and test distributions differ by a shift, causal parameters might be too conservative to perform well on the above task. This motivates anchor regression, a method that makes use of exogenous variables to solve a relaxation of the ‘causal’ minimax problem by considering a modification of the least-squares loss. The procedure naturally provides an interpolation between the solutions of ordinary least squares (OLS) and two-stage least squares. We prove that the estimator satisfies predictive guarantees in terms of distributional robustness against shifts in a linear class; these guarantees are valid even if the instrumental variable assumptions are violated. If anchor regression and least squares provide the same answer (‘anchor stability’), we establish that OLS parameters are invariant under certain distributional changes. Anchor regression is shown empirically to improve replicability and protect against distributional shifts.

AB - We consider the problem of predicting a response variable from a set of covariates on a data set that differs in distribution from the training data. Causal parameters are optimal in terms of predictive accuracy if in the new distribution either many variables are affected by interventions or only some variables are affected, but the perturbations are strong. If the training and test distributions differ by a shift, causal parameters might be too conservative to perform well on the above task. This motivates anchor regression, a method that makes use of exogenous variables to solve a relaxation of the ‘causal’ minimax problem by considering a modification of the least-squares loss. The procedure naturally provides an interpolation between the solutions of ordinary least squares (OLS) and two-stage least squares. We prove that the estimator satisfies predictive guarantees in terms of distributional robustness against shifts in a linear class; these guarantees are valid even if the instrumental variable assumptions are violated. If anchor regression and least squares provide the same answer (‘anchor stability’), we establish that OLS parameters are invariant under certain distributional changes. Anchor regression is shown empirically to improve replicability and protect against distributional shifts.

KW - causal inference

KW - distributional robustness

KW - replicability

KW - structural equation modelling

UR - http://www.scopus.com/inward/record.url?scp=85099740900&partnerID=8YFLogxK

U2 - 10.1111/rssb.12398

DO - 10.1111/rssb.12398

M3 - Journal article

AN - SCOPUS:85099740900

VL - 83

SP - 215

EP - 246

JO - Journal of the Royal Statistical Society, Series B (Statistical Methodology)

JF - Journal of the Royal Statistical Society, Series B (Statistical Methodology)

SN - 1369-7412

IS - 2

ER -

ID: 256679165