Graphical models for zero-inflated single cell gene expression
Publikation: Bidrag til tidsskrift › Tidsskriftartikel › Forskning › fagfællebedømt
Standard
Graphical models for zero-inflated single cell gene expression. / McDavid, Andrew; Gottardo, Raphael; Simon, Noah; Drton, Mathias.
I: Annals of Applied Statistics, Bind 13, Nr. 2, 2019, s. 848-873.Publikation: Bidrag til tidsskrift › Tidsskriftartikel › Forskning › fagfællebedømt
Harvard
APA
Vancouver
Author
Bibtex
}
RIS
TY - JOUR
T1 - Graphical models for zero-inflated single cell gene expression
AU - McDavid, Andrew
AU - Gottardo, Raphael
AU - Simon, Noah
AU - Drton, Mathias
PY - 2019
Y1 - 2019
N2 - Bulk gene expression experiments relied on aggregations of thousands of cells to measure the average expression in an organism. Advances in mi-crofluidic and droplet sequencing now permit expression profiling in single cells. This study of cell-to-cell variation reveals that individual cells lack detectable expression of transcripts that appear abundant on a population level, giving rise to zero-inflated expression patterns. To infer gene coreg-ulatory networks from such data, we propose a multivariate Hurdle model. It is comprised of a mixture of singular Gaussian distributions. We employ neighborhood selection with the pseudo-likelihood and a group lasso penalty to select and fit undirected graphical models that capture conditional inde-pendences between genes. The proposed method is more sensitive than existing approaches in simulations, even under departures from our Hurdle model. The method is applied to data for T follicular helper cells, and a high-dimensional profile of mouse dendritic cells. It infers network structure not revealed by other methods, or in bulk data sets. A R implementation is available at https://github.com/amcdavid/HurdleNormal.
AB - Bulk gene expression experiments relied on aggregations of thousands of cells to measure the average expression in an organism. Advances in mi-crofluidic and droplet sequencing now permit expression profiling in single cells. This study of cell-to-cell variation reveals that individual cells lack detectable expression of transcripts that appear abundant on a population level, giving rise to zero-inflated expression patterns. To infer gene coreg-ulatory networks from such data, we propose a multivariate Hurdle model. It is comprised of a mixture of singular Gaussian distributions. We employ neighborhood selection with the pseudo-likelihood and a group lasso penalty to select and fit undirected graphical models that capture conditional inde-pendences between genes. The proposed method is more sensitive than existing approaches in simulations, even under departures from our Hurdle model. The method is applied to data for T follicular helper cells, and a high-dimensional profile of mouse dendritic cells. It infers network structure not revealed by other methods, or in bulk data sets. A R implementation is available at https://github.com/amcdavid/HurdleNormal.
KW - Gene network
KW - Graphical model
KW - Group lasso
KW - Single cell gene expression
UR - http://www.scopus.com/inward/record.url?scp=85068503200&partnerID=8YFLogxK
U2 - 10.1214/18-AOAS1213
DO - 10.1214/18-AOAS1213
M3 - Journal article
C2 - 31388390
AN - SCOPUS:85068503200
VL - 13
SP - 848
EP - 873
JO - Annals of Applied Statistics
JF - Annals of Applied Statistics
SN - 1932-6157
IS - 2
ER -
ID: 226951301