
Tuesday 27/3

9:1510:30 
Bioinformatics


Jes Frellsen Estimation of generalized ensemble weights in MCMC simulations.
Joint work with Jesper FerkinghoffBorg
Probabilistic models used in bioinformatics, machine learning and statistical physics are often not analytical tractable. The Markov chain Monte Carlo (MCMC) method is one of the most important tools for approximate inference in such models. However, the standard MetropolisHastings algorithm can suffer from the generic deficiency of poor mixing. This deficiency can be addressed with generalized ensemble methods. In this talk I will present an automated, histogram based, maximum likelihood method for estimating generalized ensemble weights. I will conclude the talk by presenting applications of the method and comparisons to other methods. 

Ole Winther Bioinformatics for genomic medicine
This talk will give an overview of the collaborative work between the Bioinformatics Centre, KU and Genomic Medicine and Oncology, Copenhagen University Hospital (Riget) on using genomic data for cancer diagnosis and treatment. Cancers of unknown primary tumor (CUP or UPT) constitute 25% of all cancer cases. Being able to classify these into the type of their origin is an important aid for oncologists to decide what treatment to use. A large dataset of around 2000 gene expression profiles covering 16 tumor types was used to train a linear discriminant classifier that achieved a prediction accuracy of ~ 90%. I will discuss the steps of optimizing the classifier for this type of high dimensional (50k) data, how to determine to what degree a new sample is similar to the ones in the data (i.e. are CUP samples similar to the primary cancer samples we trained on?), how to present the predictions to the clinicians and low rank methods for robust estimation of high dimensional covariance matrices. In a second study we consider gastric (stomach) cancer prognosis. We use a gene set enrichment method to map gene expression data to gene set features representing activation of genetic pathways. A cohort of 200 gastric cancer samples is used to train a random survival forest model. I will report on the performance and on how we envision using model predictions based on clinical and genomic data in clinic. 
10:3011:00 
Image analysis


Anders RønnNielsen Lévy based modelling in brain imaging
Joint work with Kristiana Ýr Jónsdóttir, Kim Mouridsen and Eva B. Vedel Jensen
Traditional methods of analysis in brain imaging based on Gaussian random field theory may leave small, but significant changes in the signal level undetected, because the assumption of Gaussianity is not fulfilled. In group comparisons, the number of subjects in each group is usually small so the alternative strategy of using a nonparametric test may not be appropriate either because of low power. We propose to use a flexible, yet tractable model for a random field, based on kernel smoothing of a socalled Lévy basis. The resulting field may be Gaussian but there are many other possibilities, e.g. random fields based on Gamma, inverse Gaussian and normal inverse Gaussian (NIG) Lévy bases. We show that it is easy to estimate the parameters of the model and accordingly to assess by simulation the quantiles of a test statistic. A finding of independent interest is the explicit form of the kernel function that induces a covariance function belonging to the Matérn family. 
15:0016:00 
Functional data analysis


Bo Markussen Can functional regression be done without regularization?
The statistical framework of this talk is linear functional regression, where a univariate normal response is regressed on a functional covariate. The general opinion among specialists on functional data analysis is that regularization methods like penalized likelihood are needed for functional regression to work. My objective is to convince the audience that regularization is not necessary if we return to the continuous domain and replace the matrix computations by their operator counterparts. 

Andreas Kryger Jensen From highdimensional data to functional data  embracing the "curse of dimensionality"
Highdimensional data analysis and functional data analysis are two somewhat similar statistical domains that both deal with the problem of having many more predictors than observations.
A majority of methods in the literature for attacking this problem correspond to finding a solution in a certain kernel space possessing a reproductive property through solving some penalized quasiscore equation. The differences between the two domains lie in the cardinality of the index set and the underlying topology.
A recent article [1] introduced an approach for analyzing highdimensional data as functional data. We shall discuss their method, compare to common and new methods and review the literature on mapping data into function space. We present results from an extensive simulation study as well as an example of analyzing realworld proteomics data.
[1] Kun Chen, Kehui Chen, HansGeorg Müller and JaneLing Wang. Stringing HighDimensional Data for Functional Analysis. JASA Volume 106, Issue 493, 2011. 
16:0016:30 
Statistical computing


Klaus Holst 
16:3017:00 
Survival analysis


Christian Pipper Estimation of Odds of Concordance based on the Aalen additive model
When analyzing time to event data the Odds of Concordance may provide a simple and appealing summary measure of effect. One advantage is that, contrary to the much used hazard ratios in Cox regression, Odds of Concordance does not require a valid model to be meaningful. In this talk we review some current methods for estimation of odds of concordance and their shortcomings. We then propose a modified odds of concordance measure and provide a simple estimation procedure based on the Aalen additive model. 
17:1518:15 
Invited talk


Mads Nielsen 

Wednesday 28/3

9:1510:30 
Stochastic dynamic models I


Anders Christian Jensen A Markov Chain Monte Carlo approach to parameter estimation in the FitzHughNagumo model
For all but a few diffusion models an explicit expression for the transition density, and thus the likelihood function, is not available. This leaves the preferred strategy for parameter estimation an open question. There are many methods that deals with this problem, and they tend to become highly complicated to implement in practice, especially when the diffusion is multidimensional. Within the last decade novel Bayesian methods have been developed which can be used for statistical inference and we describe one such Markov Chain Monte Carlo method and adapt it to the twodimensional stochastic FitzHughNagumo model for parameter inference. 

Helle Sørensen Stochastic differential equations with random effects
Joint work with Susanne Ditlevsen.
We consider data consisting of samples of discretely observed diffusion processes. The model setup is hierarchical: (1) For each sample (subject), the diffusion process is defined by a parametric stochastic differential equation; and (2) the parameters  or at least some of them  are random. The talk is about estimation of the parameters, including those in the distribution of the random effects. We suggest to replace the correct onestepahead transitions densities in the likelihood function with Gaussian approximations and maximize the corresponding pseudolikelihood. In the talk, emphasis will be on the squareroot (or CoxIngersollRoss) process with random drift parameters. We present simulation results and apply the methods to data on pig growth. 
11:0012:15 
Stochastic dynamic models II


Alexander Sokol Exponential martingales and changes of measure for counting processes
As noted by Gjessing et al. (2010), when formulating statistical models for a counting processes in terms of a candidate intensity on general filtered probability spaces, it often is essential to have conditions on the intensity which ensure nonexplosion of the processes. We show sufficient criteria for the uniform integrability of specific exponential martingales and use this to obtain sufficient criteria for nonexplosion. In particular, the final criterion in Gjessing et al. (2010) for nonexplosion is extended from α > 1 to α ≥ 1. 

Julie Lyng Forman Testing the Markov hypothesis by nonparametric bootstrapping of a diffusion process
Nonparametric tests of the Markov hypothesis for a discretely observed stochastic process were considered by [1] who proved that a suggestable test statistics is asymptotically chisquare distributed, but at the same time observed that the approximation to the chisquare distribution is poor in finite samples. For the purpose of testing the Markov property of protein reaction coordinates I propose a nonparametric bootstrap for a diffusion process based on the local linear estimator of the conditional distribution function. Pilot studies indicate that the nonparametric bootstrap gives a good approximation to the distribution of the Markov test statistic proposed by [1]. I will discuss bandwidth selection for the nonparametric estimators and present preliminary results from the analysis of the protein reaction coordinates: Was the data generated by a Markov process?
[1] AitSahalia, Y., Fan, J. and Jiang, J., Nonparametric tests of the Markov hypothesis in continuoustime models The Annals of Statistics, 38, 31293163, 2010. 
13:3014:30 
Invited talk


Kresten LindorffLarsen Elucidating Structural and Folding Dynamics of Proteins by Molecular Dynamics Simulations
Allatom molecular dynamics simulations provide a vehicle for capturing the structures, motions, and interactions of biological macromolecules in full atomic detail. Such simulations have, however, been limited both in the timescales they could access and in the accuracy of computational models used in the simulations. I will begin by presenting briefly how progress has been made in both of these areas so that it is now possible to access the millisecond timescale, and how we have been able to parameterize relatively accurate energy functions. I will then present recent results that highlight how such longtimescale simulations have been used to provide insight in to protein dynamics.
In the area of protein folding, I will explain how we have used simulations to describe the general principles of how fastfolding proteins fold. I will also describe how simulations can be used to describe slow motions present in proteins, in both folded and unfolded states. I will also give examples how we analyze simulation data and the types of problems that we are faced in the analyses. 
