Provable Imbalanced Point Clustering

📅 2024-08-26
🏛️ International Conference on Cyber Security Cryptography and Machine Learning
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper addresses the $k$-center clustering problem in high-dimensional spaces under severe class imbalance. Existing methods suffer from theoretical gaps and computational inefficiency. To bridge this gap, we propose the first core-set construction method with provable error bounds and introduce Choice Clustering—a novel framework integrating geometric approximation, weighted sampling, and combinatorial optimization. Our approach constructs a lightweight, weighted core-set that drastically reduces data size while preserving a $(1+varepsilon)$-approximation to the original clustering objective. Extensive experiments on real-world image datasets, synthetic benchmarks, and imbalanced real-world data demonstrate that our method achieves an average 12.7% improvement in clustering accuracy, runs 3.8× faster than state-of-the-art baselines, and exhibits strong robustness to class skew. The framework thus establishes a new paradigm for large-scale imbalanced clustering—uniquely combining rigorous theoretical guarantees with practical efficiency.

Technology Category

Application Category

📝 Abstract
We suggest efficient and provable methods to compute an approximation for imbalanced point clustering, that is, fitting $k$-centers to a set of points in $mathbb{R}^d$, for any $d,kgeq 1$. To this end, we utilize emph{coresets}, which, in the context of the paper, are essentially weighted sets of points in $mathbb{R}^d$ that approximate the fitting loss for every model in a given set, up to a multiplicative factor of $1pmvarepsilon$. We provide [Section 3 and Section E in the appendix] experiments that show the empirical contribution of our suggested methods for real images (novel and reference), synthetic data, and real-world data. We also propose choice clustering, which by combining clustering algorithms yields better performance than each one separately.
Problem

Research questions and friction points this paper is trying to address.

Efficient approximation for imbalanced point clustering
Utilizing coresets to approximate fitting loss
Improving clustering performance via choice clustering
Innovation

Methods, ideas, or system contributions that make the work stand out.

Utilizes coresets for efficient clustering approximation
Combines clustering algorithms for improved performance
Validates methods with real and synthetic data experiments
🔎 Similar Papers
No similar papers found.
D
David Denisov
University of Haifa, Haifa, Israel
Dan Feldman
Dan Feldman
University of Haifa
Provable Data SummarizationMachine/Deep LearningComputer VisionComputational Geometry
S
S. Dolev
Ben-Gurion University of the Negev, Beer-Sheva, Israel
M
Michael Segal
Ben-Gurion University of the Negev, Beer-Sheva, Israel