🤖 AI Summary
Existing neural network complexity modeling relies excessively on statistical entropy, failing to capture causal algorithmic regularities. Method: This work proposes a novel paradigm for Binary Neural Network (BNN) training analysis grounded in algorithmic information theory, introducing algorithmic probability and the universal distribution into BNN training for the first time. Framed around the principle “training as algorithmic compression,” it establishes a causally interpretable framework for dynamically characterizing complexity. Block Decomposition Method (BDM) is employed to approximate algorithmic complexity, enabling stable measurement of structural evolution. Results: Experiments demonstrate that BDM-based complexity measures exhibit superior robustness over conventional entropy-based metrics, consistently showing stronger negative correlation with training loss across diverse model scales and random seeds—providing critical empirical validation for the algorithmic compression hypothesis.
📝 Abstract
Understanding and controlling the informational complexity of neural networks is a central challenge in machine learning, with implications for generalization, optimization, and model capacity. While most approaches rely on entropy-based loss functions and statistical metrics, these measures often fail to capture deeper, causally relevant algorithmic regularities embedded in network structure. We propose a shift toward algorithmic information theory, using Binarized Neural Networks (BNNs) as a first proxy. Grounded in algorithmic probability (AP) and the universal distribution it defines, our approach characterizes learning dynamics through a formal, causally grounded lens. We apply the Block Decomposition Method (BDM) -- a scalable approximation of algorithmic complexity based on AP -- and demonstrate that it more closely tracks structural changes during training than entropy, consistently exhibiting stronger correlations with training loss across varying model sizes and randomized training runs. These results support the view of training as a process of algorithmic compression, where learning corresponds to the progressive internalization of structured regularities. In doing so, our work offers a principled estimate of learning progression and suggests a framework for complexity-aware learning and regularization, grounded in first principles from information theory, complexity, and computability.