ON-LINE: Seminar in applied mathematics and statistics

SPEAKER: Peter Harremoës (Copenhagen Business College).

TITLE: The rate distortion test.

ABSTRACT: Lossy source coding and rate distortion theory has been an important branch of information theory since it was introduced by Shannon in the 1950'es. Although it is well established within information theory, applications in statistics and other scientific fields are surprisingly few. In this talk I will present applications to testing Goodness-of-Fit. The basic problem is how to compare a continuous model with a discrete data set. Data may be grouped into bins, but this requires that we both decide the number and the shape of these bins. In the literature one can find many rules of thumb about how to choose such bins. Using ideas from rate distortion theory we can quantify how good a certain set of bins is, and we can numerically find nearly optimal bins. If we allow soft bins it even follows that there is an optimal way to choose the shape of the bins. There is no correct way of choosing the number of bins. Instead there will a trade-off between the test being powerful against a certain alternative and the test being powerful against any alternative. In the special case of testing normality the rate distortion test corresponds simply to smooth the data by a Gaussian.