🤖 AI Summary
This paper addresses the critical challenge of preserving key structural properties—such as degree distribution, clustering coefficient, edge betweenness, and centrality measures—during network dimensionality reduction and sparsification. We propose an unsupervised, domain-agnostic framework grounded in algorithmic information theory: minimizing algorithmic information loss via a computable model-reproducibility criterion to jointly achieve data coarse-graining, feature selection, and network pruning. We introduce the novel paradigm of “lossless compression-driven lossy compression”, eliminating reliance on conventional compression heuristics and unifying support for network sparsification, image segmentation, and multidimensional data compression. Core technical contributions include Kolmogorov complexity estimation, Turing machine simulation, the InfoRank and MILS algorithms, and algorithmic probability–based image segmentation. Experiments on both synthetic and real-world networks demonstrate exact preservation of structural properties and superior performance over state-of-the-art methods.
📝 Abstract
We introduce a family of unsupervised, domain-free, and (asymptotically) model-independent algorithms based on the principles of algorithmic probability and information theory designed to minimize the loss of algorithmic information, including a lossless-compression-based lossy compression algorithm. The methods can select and coarse-grain data in an algorithmic-complexity fashion (without the use of popular compression algorithms) by collapsing regions that may procedurally be regenerated from a computable candidate model. We show that the method can preserve the salient properties of objects and perform dimension reduction, denoising, feature selection, and network sparsification. As validation case, we demonstrate that the method preserves all the graph-theoretic indices measured on a well-known set of synthetic and real-world networks of very different nature, ranging from degree distribution and clustering coefficient to edge betweenness and degree and eigenvector centralities, achieving equal or significantly better results than other data reduction and some of the leading network sparsification methods. The methods (InfoRank, MILS) can also be applied to applications such as image segmentation based on algorithmic probability.