Graphical models for zero-inflated single cell gene expression
Research output: Contribution to journal › Journal article › Research › peer-review
Documents
- OA-euclid.aoas.1560758430
Final published version, 3.76 MB, PDF document
Bulk gene expression experiments relied on aggregations of thousands of cells to measure the average expression in an organism. Advances in mi-crofluidic and droplet sequencing now permit expression profiling in single cells. This study of cell-to-cell variation reveals that individual cells lack detectable expression of transcripts that appear abundant on a population level, giving rise to zero-inflated expression patterns. To infer gene coreg-ulatory networks from such data, we propose a multivariate Hurdle model. It is comprised of a mixture of singular Gaussian distributions. We employ neighborhood selection with the pseudo-likelihood and a group lasso penalty to select and fit undirected graphical models that capture conditional inde-pendences between genes. The proposed method is more sensitive than existing approaches in simulations, even under departures from our Hurdle model. The method is applied to data for T follicular helper cells, and a high-dimensional profile of mouse dendritic cells. It infers network structure not revealed by other methods, or in bulk data sets. A R implementation is available at https://github.com/amcdavid/HurdleNormal.
Original language | English |
---|---|
Journal | Annals of Applied Statistics |
Volume | 13 |
Issue number | 2 |
Pages (from-to) | 848-873 |
Number of pages | 26 |
ISSN | 1932-6157 |
DOIs | |
Publication status | Published - 2019 |
- Gene network, Graphical model, Group lasso, Single cell gene expression
Research areas
Number of downloads are based on statistics from Google Scholar and www.ku.dk
ID: 226951301