Information Plane Analysis of Binary Neural Networks

📅 2026-05-05

📈 Citations: 0

✨ Influential: 0

career value

215K/year

🤖 AI Summary

This study addresses the unreliability of mutual information estimation in high-dimensional continuous representations, which undermines the statistical validity of information plane analysis. To circumvent this issue, the authors investigate the information plane within binary neural networks, leveraging the finite mutual information induced by discrete activations. Combining plug-in entropy estimators with large-scale training across 375 models, they delineate the reliable regimes of mutual information estimation as a function of sample size and dimensionality. Their findings reveal that late-stage compression is a prevalent phenomenon; however, they find no universal link between compression and generalization performance. Instead, the relationship is highly contingent on the specific task, architecture, and regularization scheme, thereby challenging the broad applicability of the Information Bottleneck theory.

📝 Abstract

Information plane (IP) analysis has been suggested to study the training dynamics of deep neural networks through mutual information (MI) between inputs, representations, and targets. However, its statistical validity is often compromised by the difficulty of estimating MI from samples of high-dimensional, deterministic representations. In this work, we perform IP analyses on binary neural networks (BNNs) where activations are discrete and MI is finite. We characterise the finite-sample behaviour of the plug-in entropy estimator and identify regimes for sample size $N$ and representation dimensionality $D$ under which MI estimates are reliable. Outside these regimes, we show that empirical MI estimates saturate to $\log_2 N$, rendering IP trajectories uninformative. Restricting attention to the reliable regime, we train 375 BNNs to investigate the existence of late-stage compression phases and the relationship between compressed representations and generalisation performance. Our results show that while late-stage compression is frequently observed, compressed latent representations do not consistently correlate with improved generalization performance. Instead, the relationship between compression and generalisation is highly dependent on task, architecture, and regularisation.

Problem

Research questions and friction points this paper is trying to address.

Information Plane

Mutual Information

Binary Neural Networks

Compression

Generalization

Innovation

Methods, ideas, or system contributions that make the work stand out.

Information Plane

Binary Neural Networks

Mutual Information Estimation