🤖 AI Summary
Kernel Power k-Means (KPKM) suffers from high computational complexity—due to reliance on the full kernel matrix—and weak noise robustness, stemming from the lack of joint centroid-sample learning. To address these limitations, we propose RFF-KPKM, the first method establishing a theoretically grounded framework for Random Fourier Features (RFF) approximation in KPKM, with provable strong consistency and explicit error bounds. Building upon this, we further introduce IP-RFF-MKPKM, which integrates possibilistic clustering with fuzzy membership within a multi-kernel learning paradigm to achieve robust and scalable clustering. Crucially, our approach avoids explicit kernel matrix construction, enabling efficient optimization via low-dimensional RFF mappings. Extensive experiments on large-scale datasets demonstrate that IP-RFF-MKPKM significantly outperforms state-of-the-art baselines in both clustering accuracy and computational efficiency, empirically validating the tightness of our theoretical bounds and its practical efficacy.
📝 Abstract
Kernel power $k$-means (KPKM) leverages a family of means to mitigate local minima issues in kernel $k$-means. However, KPKM faces two key limitations: (1) the computational burden of the full kernel matrix restricts its use on extensive data, and (2) the lack of authentic centroid-sample assignment learning reduces its noise robustness. To overcome these challenges, we propose RFF-KPKM, introducing the first approximation theory for applying random Fourier features (RFF) to KPKM. RFF-KPKM employs RFF to generate efficient, low-dimensional feature maps, bypassing the need for the whole kernel matrix. Crucially, we are the first to establish strong theoretical guarantees for this combination: (1) an excess risk bound of $mathcal{O}(sqrt{k^3/n})$, (2) strong consistency with membership values, and (3) a $(1+varepsilon)$ relative error bound achievable using the RFF of dimension $mathrm{poly}(varepsilon^{-1}log k)$. Furthermore, to improve robustness and the ability to learn multiple kernels, we propose IP-RFF-MKPKM, an improved possibilistic RFF-based multiple kernel power $k$-means. IP-RFF-MKPKM ensures the scalability of MKPKM via RFF and refines cluster assignments by combining the merits of the possibilistic membership and fuzzy membership. Experiments on large-scale datasets demonstrate the superior efficiency and clustering accuracy of the proposed methods compared to the state-of-the-art alternatives.