Information Theory and Statistical Learning

📅 2026-05-04

📈 Citations: 0

✨ Influential: 0

career value

194K/year

🤖 AI Summary

This work employs information-theoretic tools to understand and optimize the training dynamics of statistical learning models, with a particular focus on generative models. By integrating key concepts such as f-divergence, Fisher divergence, and the evidence lower bound (ELBO), it establishes a unified framework that systematically encompasses a broad spectrum of methods—from linear regression to diffusion models. Notably, the paper provides a more explicit and systematic derivation of generative diffusion models than existing treatments in the literature. Beyond offering deeper information-theoretic insights into model training mechanisms, this study also delivers a pedagogically structured exposition well-suited for teaching and self-study, thereby presenting a cohesive information-theoretic perspective across multiple mainstream modeling paradigms.

📝 Abstract

This manuscript contains preprint of a chapter under consideration for inclusion in the forthcoming third edition of {\em Cover and Thomas's Elements of Information Theory}, posted with permission from Wiley. The table of contents EIT-3 ToC of the new edition can be found at: https://docs.google.com/document/d/1L-m4oQEJw1PJhoxBeMwrrBD8S_HmvzMEkPbYvS24980/edit?usp=sharing . For feedback, please contact abbas@ee.stanford.edu Learning and information theory intersect in both model training and the characterization of fundamental performance limits. This manuscript provides a concise and accessible treatment of the first intersection, requiring only basic background in information theory and statistics at the senior undergraduate or first-year graduate level. End-of-chapter exercises make the material well suited for classroom use as well as self-study. The chapter focuses on the role of divergence measures in model training, with examples ranging from linear and logistic regression to autoregressive models, variational autoencoders, diffusion models, generative adversarial networks, and score-based models. It introduces the evidence lower bound (ELBO), $f$\!-divergences, and the Fisher divergence. In particular, the treatment of the generative diffusion model provides a more systematic and explicit derivation than is typical in the literature.

Problem

Research questions and friction points this paper is trying to address.

information theory

statistical learning

divergence measures

model training

generative models

Innovation

Methods, ideas, or system contributions that make the work stand out.

f-divergence

Fisher divergence

evidence lower bound (ELBO)