đ€ AI Summary
Quantifying higher-order information interactions in distributed systemsâsuch as biological and artificial neural networksâremains challenging due to semantic ambiguities in existing frameworks like Partial Information Decomposition (PID).
Method: This paper introduces the âShannon Invariantâ framework, grounded exclusively in Shannonâs entropy axioms, to establish the first axiomatically rigorous multivariate information decomposition. It eliminates long-standing semantic inconsistencies by deriving decomposition solely from fundamental entropy principles. The framework is inherently scalable and interpretable, enabling cross-scale and cross-architecture analysis of information processing. Coupled with higher-order dependency modeling and optimized algorithms, it supports efficient computation.
Results: We successfully identify layer-specific information processing signatures across diverse deep neural networks, characterize the dynamic evolution of information flow during training, and achieve substantial improvements in both decomposition efficiency and theoretical consistency on systems with up to one thousand nodes.
đ Abstract
Distributed systems, such as biological and artificial neural networks, process information via complex interactions engaging multiple subsystems, resulting in high-order patterns with distinct properties across scales. Investigating how these systems process information remains challenging due to difficulties in defining appropriate multivariate metrics and ensuring their scalability to large systems. To address these challenges, we introduce a novel framework based on what we call"Shannon invariants"-- quantities that capture essential properties of high-order information processing in a way that depends only on the definition of entropy and can be efficiently calculated for large systems. Our theoretical results demonstrate how Shannon invariants can be used to resolve long-standing ambiguities regarding the interpretation of widely used multivariate information-theoretic measures. Moreover, our practical results reveal distinctive information-processing signatures of various deep learning architectures across layers, which lead to new insights into how these systems process information and how this evolves during training. Overall, our framework resolves fundamental limitations in analyzing high-order phenomena and offers broad opportunities for theoretical developments and empirical analyses.