Interpreting tree ensemble machine learning models with endoR

Publikation: Bidrag til tidsskriftTidsskriftartikelForskningfagfællebedømt

Standard

Interpreting tree ensemble machine learning models with endoR. / Ruaud, Albane; Pfister, Niklas; Ley, Ruth E.; Youngblut, Nicholas D.

I: PLOS Computational Biology, Bind 18, Nr. 12, e1010714, 2022.

Publikation: Bidrag til tidsskriftTidsskriftartikelForskningfagfællebedømt

Harvard

Ruaud, A, Pfister, N, Ley, RE & Youngblut, ND 2022, 'Interpreting tree ensemble machine learning models with endoR', PLOS Computational Biology, bind 18, nr. 12, e1010714. https://doi.org/10.1371/journal.pcbi.1010714

APA

Ruaud, A., Pfister, N., Ley, R. E., & Youngblut, N. D. (2022). Interpreting tree ensemble machine learning models with endoR. PLOS Computational Biology, 18(12), [e1010714]. https://doi.org/10.1371/journal.pcbi.1010714

Vancouver

Ruaud A, Pfister N, Ley RE, Youngblut ND. Interpreting tree ensemble machine learning models with endoR. PLOS Computational Biology. 2022;18(12). e1010714. https://doi.org/10.1371/journal.pcbi.1010714

Author

Ruaud, Albane ; Pfister, Niklas ; Ley, Ruth E. ; Youngblut, Nicholas D. / Interpreting tree ensemble machine learning models with endoR. I: PLOS Computational Biology. 2022 ; Bind 18, Nr. 12.

Bibtex

@article{5a30aa6ce30b4bb9af1de1cae9d4f477,
title = "Interpreting tree ensemble machine learning models with endoR",
abstract = "Tree ensemble machine learning models are increasingly used in microbiome science as they are compatible with the compositional, high-dimensional, and sparse structure of sequence-based microbiome data. While such models are often good at predicting phenotypes based on microbiome data, they only yield limited insights into how microbial taxa may be associated. We developed endoR, a method to interpret tree ensemble models. First, endoR simplifies the fitted model into a decision ensemble. Then, it extracts information on the importance of individual features and their pairwise interactions, displaying them as an interpretable network. Both the endoR network and importance scores provide insights into how features, and interactions between them, contribute to the predictive performance of the fitted model. Adjustable regularization and bootstrapping help reduce the complexity and ensure that only essential parts of the model are retained. We assessed endoR on both simulated and real metagenomic data. We found endoR to have comparable accuracy to other common approaches while easing and enhancing model interpretation. Using endoR, we also confirmed published results on gut microbiome differences between cirrhotic and healthy individuals. Finally, we utilized endoR to explore associations between human gut methanogens and microbiome components. Indeed, these hydrogen consumers are expected to interact with fermenting bacteria in a complex syntrophic network. Specifically, we analyzed a global metagenome dataset of 2203 individuals and confirmed the previously reported association between Methanobacteriaceae and Christensenellales. Additionally, we observed that Methanobacteriaceae are associated with a network of hydrogen-producing bacteria. Our method accurately captures how tree ensembles use features and interactions between them to predict a response. As demonstrated by our applications, the resultant visualizations and summary outputs facilitate model interpretation and enable the generation of novel hypotheses about complex systems.",
author = "Albane Ruaud and Niklas Pfister and Ley, {Ruth E.} and Youngblut, {Nicholas D.}",
note = "Publisher Copyright: Copyright: {\textcopyright} 2022 Ruaud et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.",
year = "2022",
doi = "10.1371/journal.pcbi.1010714",
language = "English",
volume = "18",
journal = "P L o S Computational Biology (Online)",
issn = "1553-734X",
publisher = "Public Library of Science",
number = "12",

}

RIS

TY - JOUR

T1 - Interpreting tree ensemble machine learning models with endoR

AU - Ruaud, Albane

AU - Pfister, Niklas

AU - Ley, Ruth E.

AU - Youngblut, Nicholas D.

N1 - Publisher Copyright: Copyright: © 2022 Ruaud et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

PY - 2022

Y1 - 2022

N2 - Tree ensemble machine learning models are increasingly used in microbiome science as they are compatible with the compositional, high-dimensional, and sparse structure of sequence-based microbiome data. While such models are often good at predicting phenotypes based on microbiome data, they only yield limited insights into how microbial taxa may be associated. We developed endoR, a method to interpret tree ensemble models. First, endoR simplifies the fitted model into a decision ensemble. Then, it extracts information on the importance of individual features and their pairwise interactions, displaying them as an interpretable network. Both the endoR network and importance scores provide insights into how features, and interactions between them, contribute to the predictive performance of the fitted model. Adjustable regularization and bootstrapping help reduce the complexity and ensure that only essential parts of the model are retained. We assessed endoR on both simulated and real metagenomic data. We found endoR to have comparable accuracy to other common approaches while easing and enhancing model interpretation. Using endoR, we also confirmed published results on gut microbiome differences between cirrhotic and healthy individuals. Finally, we utilized endoR to explore associations between human gut methanogens and microbiome components. Indeed, these hydrogen consumers are expected to interact with fermenting bacteria in a complex syntrophic network. Specifically, we analyzed a global metagenome dataset of 2203 individuals and confirmed the previously reported association between Methanobacteriaceae and Christensenellales. Additionally, we observed that Methanobacteriaceae are associated with a network of hydrogen-producing bacteria. Our method accurately captures how tree ensembles use features and interactions between them to predict a response. As demonstrated by our applications, the resultant visualizations and summary outputs facilitate model interpretation and enable the generation of novel hypotheses about complex systems.

AB - Tree ensemble machine learning models are increasingly used in microbiome science as they are compatible with the compositional, high-dimensional, and sparse structure of sequence-based microbiome data. While such models are often good at predicting phenotypes based on microbiome data, they only yield limited insights into how microbial taxa may be associated. We developed endoR, a method to interpret tree ensemble models. First, endoR simplifies the fitted model into a decision ensemble. Then, it extracts information on the importance of individual features and their pairwise interactions, displaying them as an interpretable network. Both the endoR network and importance scores provide insights into how features, and interactions between them, contribute to the predictive performance of the fitted model. Adjustable regularization and bootstrapping help reduce the complexity and ensure that only essential parts of the model are retained. We assessed endoR on both simulated and real metagenomic data. We found endoR to have comparable accuracy to other common approaches while easing and enhancing model interpretation. Using endoR, we also confirmed published results on gut microbiome differences between cirrhotic and healthy individuals. Finally, we utilized endoR to explore associations between human gut methanogens and microbiome components. Indeed, these hydrogen consumers are expected to interact with fermenting bacteria in a complex syntrophic network. Specifically, we analyzed a global metagenome dataset of 2203 individuals and confirmed the previously reported association between Methanobacteriaceae and Christensenellales. Additionally, we observed that Methanobacteriaceae are associated with a network of hydrogen-producing bacteria. Our method accurately captures how tree ensembles use features and interactions between them to predict a response. As demonstrated by our applications, the resultant visualizations and summary outputs facilitate model interpretation and enable the generation of novel hypotheses about complex systems.

UR - http://www.scopus.com/inward/record.url?scp=85144606072&partnerID=8YFLogxK

U2 - 10.1371/journal.pcbi.1010714

DO - 10.1371/journal.pcbi.1010714

M3 - Journal article

C2 - 36516158

AN - SCOPUS:85144606072

VL - 18

JO - P L o S Computational Biology (Online)

JF - P L o S Computational Biology (Online)

SN - 1553-734X

IS - 12

M1 - e1010714

ER -

ID: 330900076