UCPH Statistics Seminar: Tyler Farghly

Title

Understanding Generalisation in Diffusion Models: Implicit Regularisation and the Manifold Hypothesis

Speaker: Tyler Farghly from University of Oxford

 

Abstract

Diffusion models are a class of generative models that have achieved state-of-the-art results across a diverse range of domains, including image, audio, video, protein synthesis, and language. Their remarkable success in high-dimensional settings raises fundamental questions about how and why they generalise so successfully. In fact, it has been shown that under idealised conditions where training and sampling is performed perfectly, diffusion models tend to memorise training data—suggesting that their ability to generalise relies on some form of implicit regularisation in the learning and sampling processes. In this talk, I will present two of our recent works that address this question from complementary perspectives. In the first, we develop an algorithm-dependent framework for analysing diffusion models based on algorithmic stability. By grounding our analysis in algorithmic properties, we identify multiple sources of implicit regularisation unique to diffusion models. In particular, we show how denoising score matching with early stopping (denoising regularisation), coarse discretisation of the sampling process (sampler regularisation), and optimisation with stochastic gradient descent (optimisation regularisation) each contribute to generalisation. In the second work, we turn to how diffusion models learn geometric structure in data. Namely, we identify that approximation via smoothing at the level of the score function—or equivalently, smoothing in the log-density domain—preserves and identifies low-dimensional geometry in the data. Together, these works offer new perspectives on the generalisation behaviour of diffusion models and suggest promising directions for designing generative models.