Rooting Out Entropy: Optimal Tree Extraction for Ultra-Succinct Graphs

📅 2026-03-15

📈 Citations: 0

✨ Influential: 0

career value

225K/year

🤖 AI Summary

This work proposes the Minimum Entropy Tree Extraction (MINETREX) problem, which aims to minimize the in-degree entropy of a residual graph by removing a spanning forest during lossless compression of an unlabeled graph. The paper formally defines this problem for the first time, proves its NP-hardness and inapproximability, and establishes a theoretical connection to the biregular hitting set problem. Building on canonical labeling, in-degree entropy compression, and greedy forest extraction, the authors design an efficient algorithm that achieves an additive error of at most $n/\ln 2$. The resulting representation is a highly compact graph data structure supporting logarithmic-time navigation queries and can serve as a drop-in replacement for adjacency lists, yielding significant space savings on most real-world graphs.

Technology Category

Application Category

📝 Abstract

We combine two methods for the lossless compression of unlabeled graphs - entropy compressing adjacency lists and computing canonical names for vertices - and solve an ensuing novel optimisation problem: Minimum-Entropy Tree-Extraction (MINETREX). MINETREX asks to determine a spanning forest $F$ to remove from a graph $G$ so that the remaining graph $G-F$ has minimal indegree entropy $H(d_1,\ldots,d_n) = \sum_{v\in V} d_v \log_2(m/d_v)$ among all choices for $F$. (Here $d_v$ is the indegree of vertex $v$ in $G-F$; $m$ is the number of edges.) We show that MINETREX is NP-hard to approximate with additive error better than $δn$ (for some constant $δ>0$), and provide a simple greedy algorithm that achieves additive error at most $n / \ln 2$. By storing the extracted spanning forest and the remaining edges separately, we obtain a degree-entropy compressed ("ultrasuccinct") data structure for representing an arbitrary (static) unlabeled graph that supports navigational graph queries in logarithmic time. It serves as a drop-in replacement for adjacency-list representations using substantially less space for most graphs; we precisely quantify these savings in terms of the maximal subgraph density. Our inapproximability result uses an approximate variant of the hitting set problem on biregular instances whose hardness proof is contained implicitly in a reduction by Guruswami and Trevisan (APPROX/RANDOM 2005); we consider the unearthing of this reduction partner of independent interest with further likely uses in hardness of approximation.

Problem

Research questions and friction points this paper is trying to address.

graph compression

entropy minimization

spanning forest

indegree entropy

ultra-succinct representation

Innovation

Methods, ideas, or system contributions that make the work stand out.

minimum-entropy tree extraction

ultra-succinct graph representation

indegree entropy