🤖 AI Summary
This paper addresses two key challenges in unsupervised deep clustering: the computational intractability of graph ratio-cut optimization and its weak alignment with learned similarity metrics. To this end, we propose the Probabilistic Ratio-Cut (PRCut) framework, which models cluster assignments as stochastic binary variables and performs end-to-end optimization of the expected ratio-cut objective. We derive, for the first time, a differentiable upper bound on the ratio-cut and an unbiased gradient estimator—overcoming the limitations of conventional Rayleigh quotient relaxation and deterministic discrete optimization. PRCut supports online learning and naturally integrates with self-supervised representation learning. Experiments demonstrate that PRCut significantly outperforms classical graph-cut relaxations and state-of-the-art deep clustering methods across multiple benchmarks. Under label-guided similarity, its performance matches that of supervised classifiers. Moreover, PRCut serves as a plug-and-play tool for evaluating self-supervised representation quality.
📝 Abstract
We propose a novel approach for optimizing the graph ratio-cut by modeling the binary assignments as random variables. We provide an upper bound on the expected ratio-cut, as well as an unbiased estimate of its gradient, to learn the parameters of the assignment variables in an online setting. The clustering resulting from our probabilistic approach (PRCut) outperforms the Rayleigh quotient relaxation of the combinatorial problem, its online learning extensions, and several widely used methods. We demonstrate that the PRCut clustering closely aligns with the similarity measure and can perform as well as a supervised classifier when label-based similarities are provided. This novel approach can leverage out-of-the-box self-supervised representations to achieve competitive performance and serve as an evaluation method for the quality of these representations.