High dimensional multiclass classification with applications to cancer diagnosis

Publikation: Bog/antologi/afhandling/rapportPh.d.-afhandlingForskning

Standard

High dimensional multiclass classification with applications to cancer diagnosis. / Vincent, Martin.

Department of Mathematical Sciences, Faculty of Science, University of Copenhagen, 2013.

Publikation: Bog/antologi/afhandling/rapportPh.d.-afhandlingForskning

Harvard

Vincent, M 2013, High dimensional multiclass classification with applications to cancer diagnosis. Department of Mathematical Sciences, Faculty of Science, University of Copenhagen.

APA

Vincent, M. (2013). High dimensional multiclass classification with applications to cancer diagnosis. Department of Mathematical Sciences, Faculty of Science, University of Copenhagen.

Vancouver

Vincent M. High dimensional multiclass classification with applications to cancer diagnosis. Department of Mathematical Sciences, Faculty of Science, University of Copenhagen, 2013.

Author

Vincent, Martin. / High dimensional multiclass classification with applications to cancer diagnosis. Department of Mathematical Sciences, Faculty of Science, University of Copenhagen, 2013.

Bibtex

@phdthesis{16b92254acc44bba9293e934215616c6,
title = "High dimensional multiclass classification with applications to cancer diagnosis",
abstract = "Probabilistic classifiers are introduced and it is shown that the only regular linear probabilistic classifier with convex risk is multinomial regression. Penalized empirical risk minimization is introduced and used to construct supervised learning methods for probabilistic classifiers. A sparse group lasso penalized approach to high dimensional multinomial classification is presented. On different real data examples it is found that this approach clearly outperforms multinomial lasso in terms of error rate and features included in the model. An efficient coordinate descent algorithm is developed and the convergence is established. This algorithm is implemented in the msgl R package.Examples of high dimensional multiclass problems are studied, in particular examples ofmulticlass classification based on gene expression measurements. One such example is the clinically important - problem of identifying the primary tumor site of lever metastases, this particular problem is studied in detail. In order to adjust for the lever contamination found in biopsies of metastases a computational contamination model is develop. The contamination model is presented in a domain adaption framework and a simulation based domain adaption strategy is presented. It is shown that the presented computational contamination approach drastically improves the primary tumor site classification of lever contaminated biopsies of metastases. A final classifier for identification of the primary tumor site is developed. This classifier is validated on an independent validation set consisting of lever biopsies of metastases with varying tumor content.",
author = "Martin Vincent",
year = "2013",
language = "English",
publisher = "Department of Mathematical Sciences, Faculty of Science, University of Copenhagen",

}

RIS

TY - BOOK

T1 - High dimensional multiclass classification with applications to cancer diagnosis

AU - Vincent, Martin

PY - 2013

Y1 - 2013

N2 - Probabilistic classifiers are introduced and it is shown that the only regular linear probabilistic classifier with convex risk is multinomial regression. Penalized empirical risk minimization is introduced and used to construct supervised learning methods for probabilistic classifiers. A sparse group lasso penalized approach to high dimensional multinomial classification is presented. On different real data examples it is found that this approach clearly outperforms multinomial lasso in terms of error rate and features included in the model. An efficient coordinate descent algorithm is developed and the convergence is established. This algorithm is implemented in the msgl R package.Examples of high dimensional multiclass problems are studied, in particular examples ofmulticlass classification based on gene expression measurements. One such example is the clinically important - problem of identifying the primary tumor site of lever metastases, this particular problem is studied in detail. In order to adjust for the lever contamination found in biopsies of metastases a computational contamination model is develop. The contamination model is presented in a domain adaption framework and a simulation based domain adaption strategy is presented. It is shown that the presented computational contamination approach drastically improves the primary tumor site classification of lever contaminated biopsies of metastases. A final classifier for identification of the primary tumor site is developed. This classifier is validated on an independent validation set consisting of lever biopsies of metastases with varying tumor content.

AB - Probabilistic classifiers are introduced and it is shown that the only regular linear probabilistic classifier with convex risk is multinomial regression. Penalized empirical risk minimization is introduced and used to construct supervised learning methods for probabilistic classifiers. A sparse group lasso penalized approach to high dimensional multinomial classification is presented. On different real data examples it is found that this approach clearly outperforms multinomial lasso in terms of error rate and features included in the model. An efficient coordinate descent algorithm is developed and the convergence is established. This algorithm is implemented in the msgl R package.Examples of high dimensional multiclass problems are studied, in particular examples ofmulticlass classification based on gene expression measurements. One such example is the clinically important - problem of identifying the primary tumor site of lever metastases, this particular problem is studied in detail. In order to adjust for the lever contamination found in biopsies of metastases a computational contamination model is develop. The contamination model is presented in a domain adaption framework and a simulation based domain adaption strategy is presented. It is shown that the presented computational contamination approach drastically improves the primary tumor site classification of lever contaminated biopsies of metastases. A final classifier for identification of the primary tumor site is developed. This classifier is validated on an independent validation set consisting of lever biopsies of metastases with varying tumor content.

M3 - Ph.D. thesis

BT - High dimensional multiclass classification with applications to cancer diagnosis

PB - Department of Mathematical Sciences, Faculty of Science, University of Copenhagen

ER -

ID: 97016368