Annual Meeting in the statistics network

March 27-28, 2012

Comwell hotel, Holte

How to get to the hotel by public transportation: Go to "Holte station", e.g by S-train, take bus 334 to "Kongevejen/Vasevej". Comwell hotel is located 200 meters further down Kongevejen.
Program:
Tuesday 27/3
8:45 - 9:15 Coffee with bread, juice and fruit
9:15 - 10:30 Bioinformatics
Jes Frellsen
Ole Winther
10:30 - 11:00 Image analysis ARN 
Anders Rønn-Nielsen
11:00 - 12:30 Lunch
12:30 - 15:00 Break
15:00 - 16:00 Functional data analysis
Bo Markussen
Andreas Kryger Jensen
16:00 - 16:30 Statistical computing
Klaus Holst
16:30 - 17:00 Survival analysis
Christian Pipper
17.00 - 17.15 Break
17:15 - 18:15 Invited talk
Mads Nielsen
Image analysis MN
19:00 - Dinner

Wednesday 28/3

9:15 - 10:30 Stochastic Dynamic Models I
Anders Christian Jensen
Helle Sørensen
10:30 - 11:00 Break
11:00 - 12:15 Stochastic Dynamic Models II
Alexander Sokol
Julie Lyng Forman
12:15 - 13.30 Lunch
13:30 - 14:30 Invited talk
Kresten Lindorff-Larsen
Elucidating Structural and Folding Dynamics of Proteins by Molecular Dynamics Simulations

Talks and Abstracts

Tuesday 27/3

9:15-10:30 Bioinformatics

Jes Frellsen
Estimation of generalized ensemble weights in MCMC simulations.

Joint work with Jesper Ferkinghoff-Borg

Probabilistic models used in bioinformatics, machine learning and statistical physics are often not analytical tractable. The Markov chain Monte Carlo (MCMC) method is one of the most important tools for approximate inference in such models. However, the standard Metropolis-Hastings algorithm can suffer from the generic deficiency of poor mixing. This deficiency can be addressed with generalized ensemble methods. In this talk I will present an automated, histogram based, maximum likelihood method for estimating generalized ensemble weights. I will conclude the talk by presenting applications of the method and comparisons to other methods.
Ole Winther
Bioinformatics for genomic medicine

This talk will give an overview of the collaborative work between the Bioinformatics Centre, KU and Genomic Medicine and Oncology, Copenhagen University Hospital (Riget) on using genomic data for cancer diagnosis and treatment. Cancers of unknown primary tumor (CUP or UPT) constitute 2-5% of all cancer cases. Being able to classify these into the type of their origin is an important aid for oncologists to decide what treatment to use. A large dataset of around 2000 gene expression profiles covering 16 tumor types was used to train a linear discriminant classifier that achieved a prediction accuracy of ~ 90%. I will discuss the steps of optimizing the classifier for this type of high dimensional (50k) data, how to determine to what degree a new sample is similar to the ones in the data (i.e. are CUP samples similar to the primary cancer samples we trained on?), how to present the predictions to the clinicians and low rank methods for robust estimation of high dimensional covariance matrices. In a second study we consider gastric (stomach) cancer prognosis. We use a gene set enrichment method to map gene expression data to gene set features representing activation of genetic pathways. A cohort of 200 gastric cancer samples is used to train a random survival forest model. I will report on the performance and on how we envision using model predictions based on clinical and genomic data in clinic.
10:30-11:00 Image analysis

Anders Rønn-Nielsen
Lévy based modelling in brain imaging

Joint work with Kristiana Ýr Jónsdóttir, Kim Mouridsen and Eva B. Vedel Jensen

Traditional methods of analysis in brain imaging based on Gaussian random field theory may leave small, but significant changes in the signal level undetected, because the assumption of Gaussianity is not fulfilled. In group comparisons, the number of subjects in each group is usually small so the alternative strategy of using a non-parametric test may not be appropriate either because of low power. We propose to use a flexible, yet tractable model for a random field, based on kernel smoothing of a so-called Lévy basis. The resulting field may be Gaussian but there are many other possibilities, e.g. random fields based on Gamma, inverse Gaussian and normal inverse Gaussian (NIG) Lévy bases. We show that it is easy to estimate the parameters of the model and accordingly to assess by simulation the quantiles of a test statistic. A finding of independent interest is the explicit form of the kernel function that induces a covariance function belonging to the Matérn family.
15:00-16:00 Functional data analysis

Bo Markussen
Can functional regression be done without regularization?

The statistical framework of this talk is linear functional regression, where a univariate normal response is regressed on a functional covariate. The general opinion among specialists on functional data analysis is that regularization methods like penalized likelihood are needed for functional regression to work. My objective is to convince the audience that regularization is not necessary if we return to the continuous domain and replace the matrix computations by their operator counterparts.
Andreas Kryger Jensen
From high-dimensional data to functional data - embracing the "curse of dimensionality"

High-dimensional data analysis and functional data analysis are two somewhat similar statistical domains that both deal with the problem of having many more predictors than observations.

A majority of methods in the literature for attacking this problem correspond to finding a solution in a certain kernel space possessing a reproductive property through solving some penalized quasi-score equation. The differences between the two domains lie in the cardinality of the index set and the underlying topology.

A recent article [1] introduced an approach for analyzing high-dimensional data as functional data. We shall discuss their method, compare to common and new methods and review the literature on mapping data into function space. We present results from an extensive simulation study as well as an example of analyzing real-world proteomics data.

[1] Kun Chen, Kehui Chen, Hans-Georg Müller and Jane-Ling Wang. Stringing High-Dimensional Data for Functional Analysis. JASA Volume 106, Issue 493, 2011.
16:00-16:30 Statistical computing

Klaus Holst
16:30-17:00 Survival analysis

Christian Pipper
Estimation of Odds of Concordance based on the Aalen additive model

When analyzing time to event data the Odds of Concordance may provide a simple and appealing summary measure of effect. One advantage is that, contrary to the much used hazard ratios in Cox regression, Odds of Concordance does not require a valid model to be meaningful. In this talk we review some current methods for estimation of odds of concordance and their shortcomings. We then propose a modified odds of concordance measure and provide a simple estimation procedure based on the Aalen additive model.
17:15-18:15 Invited talk

Mads Nielsen
Wednesday 28/3

9:15-10:30 Stochastic dynamic models I

Anders Christian Jensen
A Markov Chain Monte Carlo approach to parameter estimation in the FitzHugh-Nagumo model

For all but a few diffusion models an explicit expression for the transition density, and thus the likelihood function, is not available. This leaves the preferred strategy for parameter estimation an open question. There are many methods that deals with this problem, and they tend to become highly complicated to implement in practice, especially when the diffusion is multidimensional. Within the last decade novel Bayesian methods have been developed which can be used for statistical inference and we describe one such Markov Chain Monte Carlo method and adapt it to the two-dimensional stochastic FitzHugh-Nagumo model for parameter inference.
Helle Sørensen
Stochastic differential equations with random effects

Joint work with Susanne Ditlevsen.

We consider data consisting of samples of discretely observed diffusion processes. The model set-up is hierarchical: (1) For each sample (subject), the diffusion process is defined by a parametric stochastic differential equation; and (2) the parameters - or at least some of them - are random. The talk is about estimation of the parameters, including those in the distribution of the random effects. We suggest to replace the correct one-step-ahead transitions densities in the likelihood function with Gaussian approximations and maximize the corresponding pseudo-likelihood. In the talk, emphasis will be on the square-root (or Cox-Ingersoll-Ross) process with random drift parameters. We present simulation results and apply the methods to data on pig growth.
11:00-12:15 Stochastic dynamic models II

Alexander Sokol
Exponential martingales and changes of measure for counting processes

As noted by Gjessing et al. (2010), when formulating statistical models for a counting processes in terms of a candidate intensity on general filtered probability spaces, it often is essential to have conditions on the intensity which ensure nonexplosion of the processes. We show sufficient criteria for the uniform integrability of specific exponential martingales and use this to obtain sufficient criteria for nonexplosion. In particular, the final criterion in Gjessing et al. (2010) for non-explosion is extended from α > 1 to α ≥ 1.
Julie Lyng Forman
Testing the Markov hypothesis by nonparametric bootstrapping of a diffusion process

Nonparametric tests of the Markov hypothesis for a discretely observed stochastic process were considered by [1] who proved that a suggestable test statistics is asymptotically chi-square distributed, but at the same time observed that the approximation to the chi-square distribution is poor in finite samples. For the purpose of testing the Markov property of protein reaction coordinates I propose a nonparametric bootstrap for a diffusion process based on the local linear estimator of the conditional distribution function. Pilot studies indicate that the nonparametric bootstrap gives a good approximation to the distribution of the Markov test statistic proposed by [1]. I will discuss bandwidth selection for the nonparametric estimators and present preliminary results from the analysis of the protein reaction coordinates: Was the data generated by a Markov process?

[1] Ait-Sahalia, Y., Fan, J. and Jiang, J., Nonparametric tests of the Markov hypothesis in continuous-time models The Annals of Statistics, 38, 3129-3163, 2010.
13:30-14:30 Invited talk

Kresten Lindorff-Larsen
Elucidating Structural and Folding Dynamics of Proteins by Molecular Dynamics Simulations

All-atom molecular dynamics simulations provide a vehicle for capturing the structures, motions, and interactions of biological macromolecules in full atomic detail. Such simulations have, however, been limited both in the timescales they could access and in the accuracy of computational models used in the simulations. I will begin by presenting briefly how progress has been made in both of these areas so that it is now possible to access the millisecond timescale, and how we have been able to parameterize relatively accurate energy functions. I will then present recent results that highlight how such long-timescale simulations have been used to provide insight in to protein dynamics.

In the area of protein folding, I will explain how we have used simulations to describe the general principles of how fast-folding proteins fold. I will also describe how simulations can be used to describe slow motions present in proteins, in both folded and unfolded states. I will also give examples how we analyze simulation data and the types of problems that we are faced in the analyses.