|
|
News and information
Time and PlaceThe teaching period is September 5 to November 4. Lectures and exercises in Copenhagen take place in week 38 (September 19 to September 23) and week 41 (October 10 to October 14). For the remaining weeks there are preparation, homework exercises and individual projects.The lectures and exercises will take place at HCØ University of Copenhagen Universitetsparken 5 DK-2100 Copenhagen Ø Course DescriptionRemark: The primary literature for the course is the book:Hastie, T, Tibshirani, R, and Friedman, Jerome. The Elements of Statistical Learning. Data mining, inference, and prediction. Springer, 2nd ed., 2009. Note that this book is freely available as a pdf-file from the webpage linked to above. The main topics of this course are models and methods suitable for analyzing high dimensional data where there are typically many features compared to replications. This is a typical situation met in bioinformatics and exemplified by gene expression data, where we analyze experiments with thousands of parallel measurements and few replications. The course focuses on supervised learning where typical approaches to high-dimensional data analysis involve flexible models combined with shrinkage or regularization algorithms, such as ridge or lasso regression perhaps combined with basis expansion techniques such as spline regression and smoothing splines. Also non-generative models such as classification and regression trees are found useful for prediction purposes. In the course we start with linear methods for regression and classification and move on to more advanced topics including
Access to good statistical software is paramount. Therefore we will illustrate the use of the models throughout the course with methods implemented in R, and the course will train the participants in using R and Bioconductor software for the analysis of genomic data. CreditParticipants who pass the final project will receive a certificate of participation.PrerequisitesThe participants are expected to know the theory for the multivariate normal distribution, ordinary multiple regression and linear normal models, and in particular the linear algebra associated with these models. Participants also need to be confident with random variables, probability measures, expectations and conditional expectations though the course by no means will focus on a formal, measure theoretic approach, the book uses e.g. expectations and conditional expectations and their computational rules.Participants also need some prior experience with R and an interest in practical applications to biological questions. You need to know about the fundamental data structures such as vectors, lists and data frames and the fundamental functions such as lm for linear models and it is probably also necessary to know how to produce graphics. The participants are also expected to bring their own laptop for the exercises. We require that all participants prior to the course install the latest version of R and the latest version of Bioconductor (which releases will be announced on this web page when settled). For the course we will use R version 2.13.1, and here is a list of some additional packages that you might want to install right away.
ProgramSeptember 5 - 16: Preparation home.
Below you find information on which sections in the book we cover and when. There will also be a number of practical exercises. They will be made available during the course. They will consist mostly of small R exercises for training the use of R on various problems. Usually you will be given approximately 30-45 minutes to solve the exercise on your own computer. Solutions will be provided.
RegistrationTo register for the course send an email to Niels Richard Hansen. The number of participants at the course is limited to 20 students. In case of overbooking students from the universities participating in the BGC-network will be given priority.Miscellaneous
MaterialPrimary literature for the course is
See also the web page for the book The Elements of Statistical Learning for links to data, R resources, errata, etc. For additional reading we recommend the books:
Directions and accommodationPlease find information on directions and accommodation on our website. Note that we have no possibility to give financial support for participants. |
|