🤖 AI Summary
This paper challenges the classical information-theoretic view that redundancy is inherently inefficient, arguing instead that redundancy constitutes a fundamental structural principle in finite, structured learning systems.
Method: We develop a geometric framework unifying mutual information, chi-square dependence, and spectral redundancy, and—starting from the family of f-divergences—formally derive tight upper and lower bounds on redundancy for the first time.
Contribution/Results: We prove these bounds define an optimal trade-off point in learning: maximizing generalization by balancing against both over-compression (loss of structural fidelity) and over-coupling (breakdown of stability). Theoretical analysis is validated empirically via masked autoencoder experiments, which demonstrate that generalization performance peaks at a specific, measurable redundancy level. This confirms redundancy’s quantifiability, tunability, and pivotal role as a bridge between information theory and practical machine learning.
📝 Abstract
We present a theoretical framework that extends classical information theory to finite and structured systems by redefining redundancy as a fundamental property of information organization rather than inefficiency. In this framework, redundancy is expressed as a general family of informational divergences that unifies multiple classical measures, such as mutual information, chi-squared dependence, and spectral redundancy, under a single geometric principle. This reveals that these traditional quantities are not isolated heuristics but projections of a shared redundancy geometry. The theory further predicts that redundancy is bounded both above and below, giving rise to an optimal equilibrium that balances over-compression (loss of structure) and over-coupling (collapse). While classical communication theory favors minimal redundancy for transmission efficiency, finite and structured systems, such as those underlying real-world learning, achieve maximal stability and generalization near this equilibrium. Experiments with masked autoencoders are used to illustrate and verify this principle: the model exhibits a stable redundancy level where generalization peaks. Together, these results establish redundancy as a measurable and tunable quantity that bridges the asymptotic world of communication and the finite world of learning.