Annual Meeting in the statistics network

March 3-4, 2011

Comwell hotel, Holte

How to get to the hotel by public transportation: Go to "Holte station", e.g by S-train, take bus 334 to "Kongevejen/Vasevej". Comwell hotel is located 200 meters further down Kongevejen.

Program:
Thursday 3/3

8:45 - 9:15 Coffee with bread, juice and fruit
9:15 - 10:35 Dynamic stochastic models
Massimiliano Tamborrino
Martin Jacobsen
10:35 - 10:50 Break
10:50 - 11:30 Statistical computing
Klaus Holst
Thomas Scheike
Thomas Gerds
11:40 - 13.00 Survival analysis
Ulla B Mogensen
Torben Martinussen
 
13:00 - 14:40 Lunch
 
14.40 - 15:20 Functional data and image analysis
Lars Lau Hansen
15:20 - 15:40 Break
15:40 - 16:10 News from the network
16:10 - 16:55 Statistikstudiet: information og debat (in Danish)
16.55 - 17.15 Break
17:15 - 18:00 Invited talk
Carsten Wiuf
Stochastic Modeling and Analysis of DNA Sequence Data from Heterogeneous Tumors
18:30 - Dinner
 

Friday 4/3

9:15 - 10:35 Bioinformatics
Jessica Kasza
Martin Vincent
10:15 - 10:40 Break
11:05 - 11:50 Invited talk
Bjarke Feenstra
Genome-wide association studies based on Danish health register data
 
12:00 - 13:30 Lunch
 
13:30 - 14:30 Invited talk
Jens Ledet Jensen
Context dependent evolutionary models
14:30 - 15:30 Coffee

Talks and Abstracts

Thursday 3/3

9:15-10:35 Dynamic stochastic models

Massimiliano Tamborrino
Weak convergence of k-dimensional Stein's processes to k-dimensional Ornstein Uhlenbeck processes

Stein's processes represent a commonly-used description of spontaneous neuronal activity, where the discharge of a neural impulse, called spike, is modeled as the first passage time of such process through a certain threshold. The largest part of these studies performs a diffusion limit on Stein's equation to get a mathematically tractable stochastic process. The use of these continuous processes has allowed the discovery of various neuronal features that are hidden in the original Stein's model, as for instance the stochastic resonance. In general, the existing diffusion models are one-dimensional. However, nowadays neuroscientists are interested to model groups of neurons and to investigate their dependencies. For this reason, this work deals with the weak convergence of a k-dimensional Stein's process to a k-dimensional Ornstein Uhlenbeck diffusion process, as well as the weak convergence of their first passage times.

Joint work with Laura Sacerdote and Martin Jacobsen


Martin Jacobsen
Diffusions with jumps: introduction and overview

Jump diffusions as Markov processes, the generator and Ito's formula, three methods of construction, the problem of stationarity, nice models for finding martingale estimating functions.

10:50-11:30 Statistical computing

Klaus Holst, Thomas Scheike and Thomas Gerds will speak about their R-packages.

Thomas Gerds
Calibration plots for predictions of absolute risk using R

Klaus Holst
Latent Variable Models in R

Thomas Scheike
HaploSurvival: Haplotype effects for survival data


11:40-13:00 Survival analysis

Torben Martinussen
Quantifying the magnitude of confounding using the Cox model and the Aalen additive hazards model.

When estimating the association between an exposure and outcome, a simple approach to quantifying the size of confounding by a factor Z is to compare the estimates of the exposure-outcome association with and without adjustment for Z. This approach can sometimes be problematic as the adjusted and the unadjusted exposure effects can differ even in the absence of confounding (Greenland, Robins and Pearl, 1999), which is referred to as the nonlinearity effect. In this talk I will explore this problem when the response is a (possibly right-censored) survival time and when we assume either the Cox model or the Aalen additive hazards model. Under the latter model I show that there is no nonlinearity problem (perhaps not surprising) while the problem is present under the Cox model. I will show how to correct the measure of confounding under the Cox model.


Ulla B Mogensen
Comparison of predictions in multiclass decision problems

Many medical settings face a decision problem with multiclass outcome. In a diagnostic study of inflammatory bowel disease (IBD) two major types (Crohn's disease and ulcerative colitis) must be discriminated and distinguished from patients without IBD based on microarray data. In the Copenhagen stroke study patients can die from stroke-related causes or other causes or survive within a 10-year follow-up. The aim is to predict outcome from baseline covariates. We first discuss criteria for predictions of mutually exclusive events and then compare the predictive performance of rival prediction models in both the diagnostic and the competing risks study with cross-validation.

14:40-15:20 Functional data analysis and image analysis

Lars Lau Hansen
Operator approximations and analysis of multivariate functional data

Abstract: In this talk we will consider models for multi-dimensional functional data where the roughness of the underlying functions is penalized. Thinking of the data as an observed function rather than discretely sampled points turns out to provide considerable benefits. It will be shown how statistical quantities can be identified by solving partial differential equations, and that the need for computational resources is dramatically reduced when working in the functional domain.


17:15-18:00 Invited Lecture

Carsten Wiuf
Stochastic Modeling and Analysis of DNA Sequence Data from Heterogeneous Tumors

Abstract: Many cancers are believed to have clonal origin, starting from a single cell with a defining mutation and further acquiring one or more additional mutations before the first cancerous cell is established. A population of cancer cells evolves further over time and accumulates further genetic changes. Consequently, cells in different parts of a tumor might show differences in their genomes, or DNA. This phenomenon is referred to as genetic heterogeneity.

Here, I address the problem of modeling how the tumor evolves over time and accumulates changes in the DNA, starting from the initial cell with a defining mutation. The model is stochastic and relies on birth-death processes. I show that there is a simple description of how the (stochastic) number of tumor cells in the system changes over time and that the model imposes constraints on parameters that determine the cell replication; thus the model leads to biological insight.

Further, the model leads to a simple way of simulating tumor evolution. Based on this, two Follicular Lymphoma data sets are used to draw inference on model parameters and the relative ages of tumor origin, defining and subsequent mutations. The latter might have clinical relevance.


Firday 4/3

9:15-10:35 Bioinformatics

Jessica Kasza
Methods for the estimation of Bayesian networks with exogenous variables

Methods for the estimation of Bayesian networks, flexible frameworks allowing the representation of conditional independence relationships of sets of variables, typically require a data set that consists of independent and identically distributed samples. Often the data set available will be more complex, containing information on exogenous variables thought to affect the variables of interest. Here, two methods for the estimation of a Bayesian network given such a data set will be discussed. These approaches will be compared, and their use demonstrated through their application to a gene expression data set that contains data on covariates thought to affect gene expression levels.


Martin Vincent
Regularized multinomial regression using sparse group lasso

We consider regularized multinomial regression using sparse group lasso. The sparse group lasso penalty combines the lasso penalty (L1-norm) with the group lasso penalty (L2-norm). After introducing the sparse group lasso method, we shall investigate some of the characteristics of the method. We shall in particular see that the method, in some cases, produces predictors with higher sparsity, at the feature level (i.e. selects fewer features), than multinomial regression using only L1 penalty (lasso).

As a practical example we apply the sparse group lasso method to a cancer data set. The data set consist of rt-qpcr measurements of mircoRNA expression levels of 197 primary cancer tumors divide into 9 classes.


11:05-11:50 Invited Lecture

Bjarke Feenstra
Genome-wide association studies based on Danish health register data

With the advent of high-throughput genotyping microarrays some 5 years ago, genome-wide assocation studies (GWAS) emerged as a hypothesis-free method of screening of the entire genome for disease related genetic variants. A typical GWAS data set consists of a few thousand persons, each genotyped for more than 500,000 single nucleotide polymorphisms (SNPs). This genetic data is analyzed for association with phenotypic data, such as disease status. In Denmark, we benefit from detailed nation-wide health registers, which allows very cost efficient genetic screening for many diseases. At Statens Serum Institut, we are currently conducting several GWAS based on health register data. In the talk, I will present an example. Using dental data from the nationwide orthodontic registry for children, we conducted a GWAS on the timing of permanent tooth eruption and identified 4 genomic regions with robust association. All four signals were replicated in independent sample sets from the United States and Denmark. I will round off by discussing some of the additional possibilities offered by this type of data.


13:30-14:30 Invited Lecture

Jens Ledet Jensen
Context dependent evolutionary models

I consider continuous time Markov models for the evolution of a DNA string. The models are used for the analysis of aligned DNA sequences. In the simplest case one has two aligned sequences, and more generally one has several aligned sequences connected in a known phylogenetic tree. The talk will partly be a review starting with simple models for independent nucleotides, progressing to independent codons before coming to the context dependent models. In the latter models the instantaneous rates for a change at a position depend on the values of the neighbouring sites. I will spent some time on discussing reversibility of the process before turning to inference problems. Simulations, as an aid in the estimation process, seem unavoidable. Asymptotic normality of the estimates can be treated through the theory of hidden Markov models, but I do not plan to dwell on this. If time permits I will make some remarks on calculations in endpoint conditioned Markov chains.