A Generalized Information Bottleneck Theory of Deep Learning

📅 2025-09-30
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
The Information Bottleneck (IB) theory offers valuable insights into neural network learning but suffers from theoretical ambiguity and practical intractability due to the difficulty of estimating mutual information. To address these limitations, we propose the Generalized Information Bottleneck (GIB) framework, which replaces mutual information with interaction information (II) and introduces average interaction information as a computationally tractable, cooperative measure of representation synergy. GIB reformulates the IB objective—balancing compression and prediction—while preserving theoretical compatibility with classical IB. Crucially, GIB significantly improves estimability and broad applicability across diverse architectures. Empirical evaluation demonstrates that GIB consistently captures sharp compression phase transitions during training in ReLU networks, CNNs, and Transformers. Moreover, the learned representations exhibit strong alignment with model adversarial robustness. Overall, GIB provides a more rigorous, interpretable, and practically deployable theoretical foundation for information-theoretic analysis of deep learning.

Technology Category

Application Category

📝 Abstract
The Information Bottleneck (IB) principle offers a compelling theoretical framework to understand how neural networks (NNs) learn. However, its practical utility has been constrained by unresolved theoretical ambiguities and significant challenges in accurate estimation. In this paper, we present a extit{Generalized Information Bottleneck (GIB)} framework that reformulates the original IB principle through the lens of synergy, i.e., the information obtainable only through joint processing of features. We provide theoretical and empirical evidence demonstrating that synergistic functions achieve superior generalization compared to their non-synergistic counterparts. Building on these foundations we re-formulate the IB using a computable definition of synergy based on the average interaction information (II) of each feature with those remaining. We demonstrate that the original IB objective is upper bounded by our GIB in the case of perfect estimation, ensuring compatibility with existing IB theory while addressing its limitations. Our experimental results demonstrate that GIB consistently exhibits compression phases across a wide range of architectures (including those with extit{ReLU} activations where the standard IB fails), while yielding interpretable dynamics in both CNNs and Transformers and aligning more closely with our understanding of adversarial robustness.
Problem

Research questions and friction points this paper is trying to address.

Developing a generalized information bottleneck framework using synergy principles
Addressing theoretical ambiguities and estimation challenges in neural networks
Improving generalization and interpretability across diverse network architectures
Innovation

Methods, ideas, or system contributions that make the work stand out.

Reformulates Information Bottleneck using synergy principle
Uses average interaction information for computable synergy
Achieves compression phases across diverse neural architectures
🔎 Similar Papers
No similar papers found.
C
Charles Westphal
University College London, London, WC1E 6BT, UK
S
Stephen Hailes
University College London, London, WC1E 6BT, UK
Mirco Musolesi
Mirco Musolesi
University College London
Machine IntelligenceMachine LearningGenerative ModelsMulti-Agent SystemsAI and Society