🤖 AI Summary
This paper addresses the challenge of detecting high-dimensional imbalanced clusters—such as those exhibiting severe class skew or Bernoulli–Rademacher structure—in the small-sample regime. We propose a gradient-optimized projection pursuit method tailored to this setting. Theoretically, we establish, for the first time, that imbalanced clusters are *more* recoverable by projection pursuit than balanced ones, and we develop a general sample complexity framework whose tightness is validated against low-degree polynomial computational lower bounds. Methodologically, our approach integrates planted vector modeling, multi-distribution robustness design, and an efficient gradient-based optimization objective. Empirical evaluation on small-sample subsets of FashionMNIST and Human Activity Recognition demonstrates significant improvements over classical projection pursuit and clustering baselines. Our work thus provides both theoretical guarantees and a practical algorithm for unsupervised structural discovery in high-dimensional imbalanced data.
📝 Abstract
Projection Pursuit is a classic exploratory technique for finding interesting projections of a dataset. We propose a method for recovering projections containing either Imbalanced Clusters or a Bernoulli-Rademacher distribution using a gradient-based technique to optimize the projection index. As sample complexity is a major limiting factor in Projection Pursuit, we analyze our algorithm's sample complexity within a Planted Vector setting where we can observe that Imbalanced Clusters can be recovered more easily than balanced ones. Additionally, we give a generalized result that works for a variety of data distributions and projection indices. We compare these results to computational lower bounds in the Low-Degree-Polynomial Framework. Finally, we experimentally evaluate our method's applicability to real-world data using FashionMNIST and the Human Activity Recognition Dataset, where our algorithm outperforms others when only a few samples are available.