🤖 AI Summary
This paper addresses the Ω(n²) time bottleneck for computing the minimum spanning tree (MST) of n points in an arbitrary metric space. We propose the first subquadratic-time approximation algorithm. Our method employs a two-stage learning-augmented framework: first constructing a heuristic-based disconnected forest, then identifying a lightweight edge set to reconnect it—all within subquadratic time. We theoretically prove that the overlap between this forest and the optimal MST governs the approximation ratio, marking the first integration of learning principles into metric MST approximation. Our key contribution is a provably correct O(n²⁻ᵟ) (δ > 0) algorithm achieving a 2.62-approximation ratio—combining theoretical guarantees with practical efficiency—enhanced via edge-weight pruning and connectivity-aware optimization. Experiments across diverse metric spaces demonstrate approximation accuracy close to optimal while accelerating computation by several orders of magnitude over exact algorithms.
📝 Abstract
Finding a minimum spanning tree (MST) for $n$ points in an arbitrary metric space is a fundamental primitive for hierarchical clustering and many other ML tasks, but this takes $Omega(n^2)$ time to even approximate. We introduce a framework for metric MSTs that first (1) finds a forest of disconnected components using practical heuristics, and then (2) finds a small weight set of edges to connect disjoint components of the forest into a spanning tree. We prove that optimally solving the second step still takes $Omega(n^2)$ time, but we provide a subquadratic 2.62-approximation algorithm. In the spirit of learning-augmented algorithms, we then show that if the forest found in step (1) overlaps with an optimal MST, we can approximate the original MST problem in subquadratic time, where the approximation factor depends on a measure of overlap. In practice, we find nearly optimal spanning trees for a wide range of metrics, while being orders of magnitude faster than exact algorithms.