🤖 AI Summary
This paper studies tight coreset construction for the $k$-clustering problem under two important metric spaces: shortest-path metrics of planar graphs and Fréchet metrics on multidimensional curves. We propose a unified framework combining VC-dimension-based tight analysis, sensitivity sampling, and geometric structural characterization—including hierarchical decompositions of planar graphs and controlled complexity of curve segments. Our key contribution is the first matching lower bound on the VC dimension for these settings, enabling significantly improved coreset size upper bounds: $widetilde{O}(k varepsilon^{-2})$ for planar graph metrics, improving over prior $widetilde{O}(k^2 varepsilon^{-4})$ and $widetilde{O}(varepsilon^{-6})$ results; and $widetilde{O}(k d ell varepsilon^{-2} log m)$ for clustering $d$-dimensional piecewise curves of length at most $m$, breaking previous dependencies on $k^3 varepsilon^{-3}$ and $k^2 varepsilon^{-2} log |P|$. The method balances generality and precision, substantially enhancing both efficiency and scalability of clustering cost estimation.
📝 Abstract
We consider coresets for $k$-clustering problems, where the goal is to assign points to centers minimizing powers of distances. A popular example is the $k$-median objective $sum_{p}min_{cin C}dist(p,C)$. Given a point set $P$, a coreset $Omega$ is a small weighted subset that approximates the cost of $P$ for all candidate solutions $C$ up to a $(1pmvarepsilon )$ multiplicative factor. In this paper, we give a sharp VC-dimension based analysis for coreset construction. As a consequence, we obtain improved $k$-median coreset bounds for the following metrics: Coresets of size $ ilde{O}left(kvarepsilon^{-2}
ight)$ for shortest path metrics in planar graphs, improving over the bounds $ ilde{O}left(kvarepsilon^{-6}
ight)$ by [Cohen-Addad, Saulpic, Schwiegelshohn, STOC'21] and $ ilde{O}left(k^2varepsilon^{-4}
ight)$ by [Braverman, Jiang, Krauthgamer, Wu, SODA'21]. Coresets of size $ ilde{O}left(kdellvarepsilon^{-2}log m
ight)$ for clustering $d$-dimensional polygonal curves of length at most $m$ with curves of length at most $ell$ with respect to Frechet metrics, improving over the bounds $ ilde{O}left(k^3dellvarepsilon^{-3}log m
ight)$ by [Braverman, Cohen-Addad, Jiang, Krauthgamer, Schwiegelshohn, Toftrup, and Wu, FOCS'22] and $ ilde{O}left(k^2dellvarepsilon^{-2}log m log |P|
ight)$ by [Conradi, Kolbe, Psarros, Rohde, SoCG'24].