UCPH Statistics Seminar: Simon Buchholz
Title: Recovering Cluster Structure from Contextual Interactions
Abstract: Identifiable representation learning seeks to determine when latent structure can be uniquely inferred from observed data. In this talk, we will first discuss opportunities and challenges associated with this framework. We then focus on a specific instance of the problem: recovering a discrete cluster structure among tokens when direct feature-based information is unavailable. In this setting, the only learning signal arises from the interaction behavior between tokens. We analyze this problem through the lenses of information theory and identifiability, and establish conditions under which gradient-based optimization on token embeddings can provably recover the underlying cluster structure.