Associate Professor
Department of Mathematical Sciences
Universitetsparken 5
DK-2100 Copenhagen Ø
Denmark
+45 - 35 32 07 83
E-mail

Statistics, probability theory and biological applications. Some areas of primary interest are:

  • Statistics
    Statistical learning, computational statistics, inference for point processes, regression, lasso and high-dimensional data analysis.
  • Probability theory
    Stochastic processes, simulation, Malliavin calculus for risk estimation and forecasting.
  • Multivariate point process models
    I work on general classes of point process models suitable for capturing dependence structures among different one-dimensional point processes. Applications include neuron spike time data analysis, analysis of genome organization and news information flow. I am particularly interested in infinite-dimensional parametrization of filter functions and penalized maximum-likelihood estimation including lasso and grouped-lasso methods. The project is accompanied by the development of the ppstat package for R.
  • Sparse Group Lasso
    The sparse group lasso penalty allows for group-wise regularization and selection properties as well as regularization and selection of individual parameters. We have developed general algorithms and generic implementations and we currently apply sparse group lasso to multiclass classification problems in particular in cancer diagnostics. We work on extending the applications to support selection and selection of functional components, potentially in combination with smoothing.
  • Intervention effect estimation via stochastic processes
    In this project we consider modeling the entire dynamics of a stationry process to compute intervention effects in such models. The estimation is either via the invariant distributions and using cross sectional data or using dynamic data. In either case, the effect of imposing sparseness assumptions in different parametrizations and techniques for tuning parameter selection are explored. In particular, a relation between the Malliavin calculus and Stein's unbiased risk estimate is explored.
  • 2012

    Alexander Sokol, Niels Richard Hansen
    Exponential martingales and changes of measure for counting processes
    Submitted to the Bernoulli

    Martin Vincent, Niels Richard Hansen
    Sparse group lasso and high dimensional multinomial classification
    Submitted to Computational Statistics & Data Analysis

    2010

    Niels Richard Hansen
    Penalized maximum likelihood estimation for generalized linear point processes
    Submitted

    2010

    Lisbeth Carstensen, Albin Sandelin, Ole Winther and Niels Richard Hansen
    Multivariate Hawkes process models of the occurrence of regulatory elements
    BMC Bioinformatics 2010, 11:456.

    2009

    Niels Richard Hansen
    Statistical models for local occurrences of RNA-structures
    Journal of Computational Biology. June 2009, 16(6): 845-858.

    Niels Richard Hansen
    The maximum of a Lévy process reflected at a general barrier
    Stochastic Processes and Their Applications, 119(7), 2336-2356.

    Shiraz Ali Shah, Niels R. Hansen and Roger A. Garrett
    Distribution of CRISPR spacer matches in viruses and plasmids of crenarchaeal acidothermophiles and implications for their inhibitory mechanism.
    Biochemical Society Transactions, vol. 37, 23–28.

    2007

    Niels Richard Hansen
    Asymptotics for Local Maximal Stack Scores with General Loop Penalty
    Advances in Applied Probability, vol. 39, No. 3, 776-798.

    2006

    Niels Richard Hansen
    Local alignment of Markov chains [pdf]
    The Annals of Applied Probability, Vol. 16, No. 3, 1262-1296.

    Niels Richard Hansen
    The maximum of a random walk reflected at a general barrier [pdf]
    The Annals of Applied Probability, Vol. 16, No. 1, 15-29.

    2005

    Niels Richard Hansen, Anders Tolver Jensen
    The extremal behavior over regenerative cycles for Markov additive processes with heavy tails [ps, pdf]
    Stochastic process. Appl. 115(4), 579-591.

    2003

    Niels Richard Hansen
    Geometric ergodicity of discrete time approximations to multivariate diffusions [ps, pdf]
    Bernoulli 9(4), 725-743.

    2002

    Niels Richard Hansen, Ernst Hansen
    Establishing geometric drift via the Laplace transform of symmetric measures [ps, pdf]
    Statistics & Probability Letters, volume 60, nr. 3, 289-295.

    2001

    Søren Boel, Toke Meier Carlsen, Niels Richard Hansen
    A useful strengthening of the Stone-Weierstrass theorem
    The American Mathematical Monthly, volume 108, nr. 7, 642-643.

    2011

    Alexander Sokol, Niels Richard Hansen
    Using measure changes to construct stochastic intensity point processes
    Poster. Presented at the Dynstoch meeting in Heidelberg.

    2005

    Niels Richard Hansen
    A note on stochastic context-free grammars, termination and the EM-algorithm
    Unpublished preprint

    2005

    Niels Richard Hansen
    Local stacks in a Markov chain
    Unpublished preprint

    2004

    Niels Richard Hansen
    Bioinformatik - en statistisk disciplin (in Danish)
    Matilde 19, March 2004

    2003

    Niels Richard Hansen
    Markov Controlled Excursions, Local Alignment and Structure
    From Markov additive processes to biological sequence analysis

    Ph.D. Thesis

    2000

    Niels Richard Hansen
    Classification of Markov chains on Rk
    Unpublished preprint

    Niels Richard Hansen
    Convergence Speed of Markov Chains with Emphasis on Discrete Approximations of the Langevin Diffusion and the Metropolis-Hastings Algorithm
    Master's Thesis

    1997

    Niels Richard Hansen
    Riesz repræsentationssætning og Lebesgue målet (in Danish)
    Famøs March 1997.

    2011

    Correlation measures and models of multiple ChIP-seq peak collections
    IMS invited talk, WNAR

    Multivariate sparse dynamic process modeling and inference
    Departmental seminar, Stanford University

    Correlations of ChIP-Seq Peaks and Other Genomic Signals
    Statistics and Genomics Seminar, UC Berkeley

    2010

    Multivariate point process models of genome organization.
    Nordstat, 2010

    Multivariate point process models with applications to genomic organization.
    Seminar, Oxford University, 2010

    Multivariate point process models
    Annual statistics meeting, University of Copenhagen, 2010

    2009

    Non-parametric estimation of linear filter functions
    Premeeting, two-day meeting in Odense, 2009

    2008

    Maxima of Reflected Processes with Applications
    Departmental seminar, Stanford University, 2008

    Point Process and Marked Point Process Models of Features on Genomes
    International Symposium. Recent Challenges for Statistics in the Biosciences
    100 Years after Gustav Zeuner, 2008

    2007

    Point process models of motifs in biological sequences
    Mini workshop (preceeding two-day meeting DSTS), DTU, 2007

    Point processes in biological sequence analysis
    Lecture I: Probabilistic analysis of simple models
    Lecture II: Statistical modeling
    Stochastics Meeting Lunteren, 2007

    Point process models in genome analysis
    Seminar, Gothenburg, 2007

    Brownsk Bevægelse
    fra pollenkorn til matematisk blomst

    HCØ-dage, Copenhagen, 2007

    Levy Processes Reflected at a General Barrier
    5th International Conference on Levy Processes, Copenhagen, 2007

    2006

    Asymptotics for Local Maximal Stack Scores with General Loop Penalty
    31st Conference on Stochastic Processes and their Applications, Paris, 2006

    Discriminative estimation via an asymptotic exponential tail property
    21st Nordic Conference on Mathematical Statistics, Rebild, 2006

    Detecting local deviations. Optimization and applications to RNA-gene searching
    Conference on Stochastics in Science in Honor of Ole E. Barndorff-Nielsen. Guanajuato, Mexico

    Locating small stem-loop RNA-structures in genomes
    Max Planck Institute for Molecular Genetics

    2005

    Structural RNA Searching
    SemStat 2005, University of Warwick

    Local Maximal Stack Scores with General Loop Penalty Function
    EVA 2005, Gothenburg

    Testing local deviations from hypotheses for biological sequence generation
    Todagsmøde DSTS, University of Aarhus

    Stem-loop search in DNA-sequences
    Bioinformatics Centre, University of Copenhagen

    2004

    The Maximum of a Random Walk Reflected at a General Barrier
    DYNSTOCH workshop, Copenhagen

    Markov controlled excursions, local alignment and structure
    Ph.D. defense, University of Copenhagen

    2003

    Classification of Biological Sequences
    Bioinformatics Centre, University of Copenhagen

    Local Similarity Scores in Semi-Markov Models of Biological Sequences
    EYSM meeting

    2003

    Folding of random sequences with applications to RNA-gene finding
    UC, Berkeley

    2002

    Significant folding of random sequences
    Lund University

    Local similarity for sequences
    University of Aarhus

  • Main teaching duties
    At the Department of Mathematical Sciences I teach statistics and probability theory.

    I have mostly been involved in the teaching of master level courses in statistics and probability theory or the teaching of introductory courses in biostatistics and statistical aspects of bioinformatics.
  • Statistical Learning
    I take a special interest in developing a course in statistical learning adequate for the master students in statistics, eScience and bioinformatics. The course I develop focus on statistical learning for classification problems with the point of view that this is a dichotomous regression model selection problem, where the objective is prediction of the 0-1-variable and the covariate vector is high-dimensional. Typical methodology is lasso regression in the glm/gam-context, classification trees and boosting techniques -- but also classical multivariate techniques such as LDA/QDA, regularized modifications and naive Bayes in general.
  • Regression Analysis
    I give a classical glm-based course in regression analysis. In contrast to the statistical learning course the focus is on interpretation of parameters, confidence intervals and tests. We discuss the difference between observational data and designed experiments.
  • Introductory Statistics
    I have over the last years developed an introductory master's course in probability theory and statistics, which is particularly aimed at problems in the natural sciences. Currently we emphasize problems in biology/bioinformatics and applications relevant to the master's in eScience. We attempt to take a computational and simulation based approach (using R) to develop the fundamental theoretical ideas such as transformations and distributions of test statistics, constructions of confidence intervals etc.
  • 09(2)

    Statistics BI/E (bioinformatics/eScience)

    09(1)

    Probability Theory 1 (Sand1)/Measure and integration theory (MI)

    09

    Ph.D.-course: Statistical analysis of gene expression data with R and Bioconductor

    09(4)

    Statistical Learning

    BMC-course: Statistical Learning and bioinformatics

    08(2)

    Statistics BI/E (bioinformatics/eScience)

    08(1)

    Probability Theory 1 (Sand1)/Measure and integration theory (MI)

    08(4)

    Flerdimensional Analyse

    08(3)

    Multiple Testing

    07-08(2)

    Statistics BI (bioinformatics)

    Statistics BK (biochemistry)

    Statistics MB (molecular biomedicine)

    2007

    Ph.D.-Course: Statistical Analysis of Microarray Expression Data with R and Bioconductor

    2007(3)

    Statistical Learning

    06-07(2)

    Statistics BI (bioinformatics)

    Statistics BK (biochemistry)

    2006(3)

    Random Walks with Applications

    2005(2)

    Statistics BI (bioinformatics)

    2004(2)

    Statistical Aspects of Sequence Alignment

    2004(1)

    Statistics for Bioinformaticians

    2004

    Statistical Learning

    2002

    Advanced Probability Theory

    2002

    Statistics and Probability for Bioinformaticians (in Danish)

    2001

    Kategoriske tidsrækker og biologisk sekvensanalyse (in danish)

    2010-

    Alexander Sokol
    Causal inference

    2009-

    Martin Vincent
    Identification of miRNA signatures that can guide rational cancer therapy.

    2006-2010

    Lisbeth Carstensen
    Statistical analysis of multi-factorial gene regulation

    2004

    Markov chains, theory, applications, and simulations
    Course given to high-school teachers in mathematics.

    Statistical Learning [Bachelor and Master level]

    From the point of view of this author, statistical learning is the subject that deals with regression (the observable being continuous or discrete) using a model with an infinite dimensional parameter space. Thus classical models with finite dimensional parameters are abandoned to achieve greater flexibility and better ability to capture such things as non-linear effects. A popular expression is to say that "we let the data talk" and put only minimalistic assumptions into the model. As a consequence there is generally made no attempt to infer the entire, infinite dimensional parameter from a finite dataset, but rather some parameter function such as a conditional mean value. The objective is almost always to make predictions/classifications.

    Due to the setup, there is no general, all purpose method such as MLE for finite dimensional parameters. Consequently there is a range of different methods with their strengths and weaknesses. Some methods like generalized additive models are close in spirit to classical statistical models, and other methods like neural networks, kernel methods such as support vector machines, and tree based regression and classification are more remote.

    It is possible to take out almost any of these typical methods from statistical learning (generalized additive models, neural networks, support vector machines, trees, etc.) and do a project on that.

    More advanced topics include combination methods (committees, boosting, bagging) and evaluation methods (cross-validation, bootstrapping, AIC, BIC) and their theoretical foundation. Turning up the theoretical level a little more it is possible to do a Master's thesis on consistency and rate of convergence of many of these methods.

    Reflected processes [Bachelor and Master level]

    In general, reflection is a boundary modification of the behavior of an underlying stochastic process with the purpose of restricting the movements of the process to a certain part of the state space.

    There are several possibilities for doing a project on this topic. The reflected random walk - a discrete time process - is a most central object in queuing theory or in biological sequence analysis. It is possible to do a Bachelor's project, say, on the reflected random walk.

    In continuous time one can study reflections of Lévy processes with applications to dams, storages, queues, risk theory etc. (Master's level). A Bachelor's project on reflections of one-dimensional Brownian motion (Sand 4 necessary) can be based on [Harrison, Brownian motion and stochastic flow systems, 1985].

    The topics above can all be turned into a Master's thesis. Some special problems that are also suitable for a Master's thesis is multivariate reflection of Brownian motion and/or time-dependent boundaries. A very specific problem [Master's thesis] in direct continuation of my own research is to study the extremal behavior of a one-dimensional Lévy process reflected at a time-dependent barrier. In particular there are some open problems and small conjectures when the running maximum of the reflected process drifts to infinity.

    Applications of Markov processes [Bachelor and Master level]

    I can suggest a number of concrete topics drawing on (parts of) the general theory of Markov processes.

    Continuous time Markov processes on a discrete state space applied to models of molecular evolution can form a fine Bachelor's project.

    Stochastic stability and the theory of discrete time Markov processes on a general state space form another good topic. Applications include financial time series (ARMA, ARCH, GARCH etc) and MCMC (see below).

    A more advanced topic [Master's thesis] is multivariate diffusions (continuous time Markov processes) applied to problems in molecular dynamics. Here it is again important to study stability as well as meta-stability (difficult). Other issues of importance are methods for efficient reduction of dimension and parameter estimation.

    Markov Chain Monte Carlo (MCMC) [Bachelor and Master level]

    There are many possibilities for doing a project on MCMC. I can suggest applications to problems in bioinformatics, and a more specialized application to exact tests in multi-dimensional contingency tables. I am quite open for suggestions in this area.

    Point processes [Bachelor and Master level]

    Poisson point processes (homogeneous or inhomogeneous) form a good subject for a Bachelor's project. It can be theoretical or practical.

    A more advanced topic is weak convergence - or convergence in distribution - of point processes. Depending on the ambition and the size of the project, one can either study the theory of weak convergence on the metric space of counting measures, and/or one can deal with some applications. I can suggest applications in extreme value theory and applications to the occurrence of patterns in sequences.

    Models for genome-wide organization of stem-loops [Bioinformatics and Statistics Master's Thesis]

    Organization of various types of motifs on the genome can be investigated using point processes. The interest is especially on understanding the inhomogeneous occurrence of biologically relevant motifs

    One type of motif is the stem-loop structure, which are found for instance for miRNA genes. This project suggests to model the occurrence of stem-loops using a point process model, and in particular to include such things as nucleotide frequencies and other covariate information in the (inhomogeneous) intensity to provide some explanation of an apparent inhomogeneous occurrence of stem-loops in genomic data. Extensions include Cox-processes (describing intensity inhomogeneity via an unobserved stochastic process) and Hawkes-models (allow for dependence between occurrences).

    Efficient stem-loop and secondary structure searching [Bioinformatics Master's thesis]

    The program StemSearch is a straight forward implementation of a dynamic programming algorithm. The most important features of StemSearch are the understanding of the statistical properties of the resulting stem-loop score, which depend upon the choice of parameters, and the possibility to tailor the parameters towards specific search target.

    The computation time can, however, be reduced substantially with the introduction of proper heuristics. This may prove valuable not only in the actual genome-wide searches but especially when optimizing the parameters. A thesis on this subject can introduce heuristics relying on well-known, efficient pattern matching techniques, e.g. suffix trees, with subsequent dynamic programming refinements.

    Wiener-Hopf factorization for Markov additive processes [Master level]

    The Wiener-Hopf factorization identity is a factorization on the transform level (that being Laplace transform, characteristic functions or moment generating functions) of a distribution on the real line. It is closely related to the study of real valued random walks. There is an analogous matrix factorization identity for random walks that are controlled by an underlying finite state space Markov chain - so-called Markov additive processes. Being able to compute the factors is important and there are some interesting (nasty) problems that occur when considering Markov additive processes compared to random walks.

    Maximal clusters in non-critical percolation and related models [Master's thesis]

    This is a suggestion to study the paper entitled Maximal clusters in non-critical percolation and related models by Remco van der Hofstad and Frank Redig. They investigate the typical size and the fluctuations of the maximal cluster in percolation models.


    Level: The level indicates the minimal level where I expect that a student can do a project on the topic. A project on master level is doable for a student taking a master's in statistics, mathematics or the like.

    Please consult the individual programs for details on copyright etc. I would also appreciate if you report the bugs according to the guidelines for the individual programs.

    If you use a program for scientific purposes please remember to provide the relevant citation. See the individual programs for details.

    This command-line program implemented in C++ can scan a genome sequence for putative stem-loop structures. The output contains a ranked list of high-scoring putative stem-loops with a normalized nat-score and an E-value to guide the assessment of statistical significance.

    StemSearch is a dedicated datamining program, and effort is made to discriminate structures from the bulk genome and to give a proper statistical treatment of the results.

    Current version: 0.9. Released July 1, 2008.

    Reference:
    Niels Richard Hansen (2008)
    Statistical models for local occurrences of RNA-structures
    Journal of Computational Biology. June 2009, 16(6): 845-858.

    Installation
    Download the gnuzipped tar-file, gunzip into an appropriate directory, untar, read and follow the instructions in the file INSTALL.

    The R-package ppstat is available from CRAN for doing point process statistics for multivariate point processes. The methods for the multivariate Hawkes process were used in [2] below. Some theory is found in [1].

    Current version: 0.8. Released March 9, 2012.

    References:
    [1] Niels Richard Hansen
    Penalized maximum likelihood estimation for generalized linear point processes
    Submitted

    [2] Lisbeth Carstensen, Albin Sandelin, Ole Winther and Niels Richard Hansen
    Multivariate Hawkes process models of the occurrence of regulatory elements
    BMC Bioinformatics 2010, 11:456.

    Installation
    Follow the installation instructions from the Download and Installation link below.