Invariant Policy Learning: A Causal Perspective

Research output: Contribution to journalJournal articleResearchpeer-review

Standard

Invariant Policy Learning : A Causal Perspective. / Saengkyongam, Sorawit; Thams, Nikolaj; Peters, Jonas; Pfister, Niklas.

In: IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 45, No. 7, 2023, p. 8606-8620.

Research output: Contribution to journalJournal articleResearchpeer-review

Harvard

Saengkyongam, S, Thams, N, Peters, J & Pfister, N 2023, 'Invariant Policy Learning: A Causal Perspective', IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 45, no. 7, pp. 8606-8620. https://doi.org/10.1109/TPAMI.2022.3232363

APA

Saengkyongam, S., Thams, N., Peters, J., & Pfister, N. (2023). Invariant Policy Learning: A Causal Perspective. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(7), 8606-8620. https://doi.org/10.1109/TPAMI.2022.3232363

Vancouver

Saengkyongam S, Thams N, Peters J, Pfister N. Invariant Policy Learning: A Causal Perspective. IEEE Transactions on Pattern Analysis and Machine Intelligence. 2023;45(7):8606-8620. https://doi.org/10.1109/TPAMI.2022.3232363

Author

Saengkyongam, Sorawit ; Thams, Nikolaj ; Peters, Jonas ; Pfister, Niklas. / Invariant Policy Learning : A Causal Perspective. In: IEEE Transactions on Pattern Analysis and Machine Intelligence. 2023 ; Vol. 45, No. 7. pp. 8606-8620.

Bibtex

@article{a440e450e9a14243af5061b5ad147089,
title = "Invariant Policy Learning: A Causal Perspective",
abstract = "Contextual bandit and reinforcement learning algorithms have been successfully used in various interactive learning systems such as online advertising, recommender systems, and dynamic pricing. However, they have yet to be widely adopted in high-stakes application domains, such as healthcare. One reason may be that existing approaches assume that the underlying mechanisms are static in the sense that they do not change over different environments. In many real-world systems, however, the mechanisms are subject to shifts across environments which may invalidate the static environment assumption. In this paper, we take a step toward tackling the problem of environmental shifts considering the framework of offline contextual bandits. We view the environmental shift problem through the lens of causality and propose multi-environment contextual bandits that allow for changes in the underlying mechanisms. We adopt the concept of invariance from the causality literature and introduce the notion of policy invariance. We argue that policy invariance is only relevant if unobserved variables are present and show that, in that case, an optimal invariant policy is guaranteed to generalize across environments under suitable assumptions.",
keywords = "Causality, contextual bandits, distributional shift, Extraterrestrial measurements, Heuristic algorithms, off-policy learning, Particle measurements, Random variables, Reinforcement learning, Training, Visualization",
author = "Sorawit Saengkyongam and Nikolaj Thams and Jonas Peters and Niklas Pfister",
note = "Publisher Copyright: IEEE",
year = "2023",
doi = "10.1109/TPAMI.2022.3232363",
language = "English",
volume = "45",
pages = "8606--8620",
journal = "IEEE Transactions on Pattern Analysis and Machine Intelligence",
issn = "0162-8828",
publisher = "Institute of Electrical and Electronics Engineers",
number = "7",

}

RIS

TY - JOUR

T1 - Invariant Policy Learning

T2 - A Causal Perspective

AU - Saengkyongam, Sorawit

AU - Thams, Nikolaj

AU - Peters, Jonas

AU - Pfister, Niklas

N1 - Publisher Copyright: IEEE

PY - 2023

Y1 - 2023

N2 - Contextual bandit and reinforcement learning algorithms have been successfully used in various interactive learning systems such as online advertising, recommender systems, and dynamic pricing. However, they have yet to be widely adopted in high-stakes application domains, such as healthcare. One reason may be that existing approaches assume that the underlying mechanisms are static in the sense that they do not change over different environments. In many real-world systems, however, the mechanisms are subject to shifts across environments which may invalidate the static environment assumption. In this paper, we take a step toward tackling the problem of environmental shifts considering the framework of offline contextual bandits. We view the environmental shift problem through the lens of causality and propose multi-environment contextual bandits that allow for changes in the underlying mechanisms. We adopt the concept of invariance from the causality literature and introduce the notion of policy invariance. We argue that policy invariance is only relevant if unobserved variables are present and show that, in that case, an optimal invariant policy is guaranteed to generalize across environments under suitable assumptions.

AB - Contextual bandit and reinforcement learning algorithms have been successfully used in various interactive learning systems such as online advertising, recommender systems, and dynamic pricing. However, they have yet to be widely adopted in high-stakes application domains, such as healthcare. One reason may be that existing approaches assume that the underlying mechanisms are static in the sense that they do not change over different environments. In many real-world systems, however, the mechanisms are subject to shifts across environments which may invalidate the static environment assumption. In this paper, we take a step toward tackling the problem of environmental shifts considering the framework of offline contextual bandits. We view the environmental shift problem through the lens of causality and propose multi-environment contextual bandits that allow for changes in the underlying mechanisms. We adopt the concept of invariance from the causality literature and introduce the notion of policy invariance. We argue that policy invariance is only relevant if unobserved variables are present and show that, in that case, an optimal invariant policy is guaranteed to generalize across environments under suitable assumptions.

KW - Causality

KW - contextual bandits

KW - distributional shift

KW - Extraterrestrial measurements

KW - Heuristic algorithms

KW - off-policy learning

KW - Particle measurements

KW - Random variables

KW - Reinforcement learning

KW - Training

KW - Visualization

UR - http://www.scopus.com/inward/record.url?scp=85147223594&partnerID=8YFLogxK

U2 - 10.1109/TPAMI.2022.3232363

DO - 10.1109/TPAMI.2022.3232363

M3 - Journal article

C2 - 37018267

AN - SCOPUS:85147223594

VL - 45

SP - 8606

EP - 8620

JO - IEEE Transactions on Pattern Analysis and Machine Intelligence

JF - IEEE Transactions on Pattern Analysis and Machine Intelligence

SN - 0162-8828

IS - 7

ER -

ID: 336076774