A Tight VC-Dimension Analysis of Clustering Coresets with Applications

📅 2025-01-11
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper studies tight coreset construction for the $k$-clustering problem under two important metric spaces: shortest-path metrics of planar graphs and Fréchet metrics on multidimensional curves. We propose a unified framework combining VC-dimension-based tight analysis, sensitivity sampling, and geometric structural characterization—including hierarchical decompositions of planar graphs and controlled complexity of curve segments. Our key contribution is the first matching lower bound on the VC dimension for these settings, enabling significantly improved coreset size upper bounds: $widetilde{O}(k varepsilon^{-2})$ for planar graph metrics, improving over prior $widetilde{O}(k^2 varepsilon^{-4})$ and $widetilde{O}(varepsilon^{-6})$ results; and $widetilde{O}(k d ell varepsilon^{-2} log m)$ for clustering $d$-dimensional piecewise curves of length at most $m$, breaking previous dependencies on $k^3 varepsilon^{-3}$ and $k^2 varepsilon^{-2} log |P|$. The method balances generality and precision, substantially enhancing both efficiency and scalability of clustering cost estimation.

Technology Category

Application Category

📝 Abstract
We consider coresets for $k$-clustering problems, where the goal is to assign points to centers minimizing powers of distances. A popular example is the $k$-median objective $sum_{p}min_{cin C}dist(p,C)$. Given a point set $P$, a coreset $Omega$ is a small weighted subset that approximates the cost of $P$ for all candidate solutions $C$ up to a $(1pmvarepsilon )$ multiplicative factor. In this paper, we give a sharp VC-dimension based analysis for coreset construction. As a consequence, we obtain improved $k$-median coreset bounds for the following metrics: Coresets of size $ ilde{O}left(kvarepsilon^{-2} ight)$ for shortest path metrics in planar graphs, improving over the bounds $ ilde{O}left(kvarepsilon^{-6} ight)$ by [Cohen-Addad, Saulpic, Schwiegelshohn, STOC'21] and $ ilde{O}left(k^2varepsilon^{-4} ight)$ by [Braverman, Jiang, Krauthgamer, Wu, SODA'21]. Coresets of size $ ilde{O}left(kdellvarepsilon^{-2}log m ight)$ for clustering $d$-dimensional polygonal curves of length at most $m$ with curves of length at most $ell$ with respect to Frechet metrics, improving over the bounds $ ilde{O}left(k^3dellvarepsilon^{-3}log m ight)$ by [Braverman, Cohen-Addad, Jiang, Krauthgamer, Schwiegelshohn, Toftrup, and Wu, FOCS'22] and $ ilde{O}left(k^2dellvarepsilon^{-2}log m log |P| ight)$ by [Conradi, Kolbe, Psarros, Rohde, SoCG'24].
Problem

Research questions and friction points this paper is trying to address.

k-clustering
core-set construction
cost estimation
Innovation

Methods, ideas, or system contributions that make the work stand out.

VC Dimension
Kernel Set Optimization
k-Median Clustering
🔎 Similar Papers
No similar papers found.