🤖 AI Summary
This work addresses the fundamental problem in graph neural networks and semi-supervised learning of selecting the most informative set of \( k \) vertices, given a graph and a labeling budget \( k \), to optimally support label inference across the entire graph. The authors propose a novel approximation algorithm grounded in combinatorial optimization and graph theory, which achieves—for the first time under standard budget constraints—a theoretical approximation ratio of \( \tilde{O}(\log^{1.5} n) \). This result overcomes key limitations of prior approaches that either relied on resource augmentation or lacked rigorous theoretical guarantees. The algorithm is both scalable and effective: its efficient heuristic variant handles large-scale graphs while maintaining high label prediction accuracy and consistently outperforming existing methods.
📝 Abstract
In the graph label selection problem, one is given an $n$-vertex graph and a budget $k$, and seeks to select $k$ vertices whose labels enable accurate prediction of the labels on the remaining vertices. This problem formalizes distilling a small representative set from the whole graph. We present the first $\tilde{O}(\log^{1.5} n)$-approximation algorithm for graph label selection under the standard budget constraint. Prior work either relies on resource augmentation, allowing substantially more than $k$ labeled vertices, or consists primarily of heuristics without provable guarantees. Finally, we demonstrate that practical heuristic variants of our algorithm scale to significantly larger graphs than previous methods, while essentially retaining their quality.