A Tutorial on Discriminative Clustering and Mutual Information

📅 2025-05-07
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper addresses fundamental challenges in discriminative clustering: the evolving paradigm from explicit decision-boundary modeling to implicit invariance-aware discriminator design, the historical yet problematic role of mutual information (MI), and the persistent difficulty of automatic cluster number selection. Methodologically, it systematically identifies estimation bias and optimization instability as root causes of MI’s limitations in deep discriminative clustering, and proposes a novel framework integrating discriminative loss reconstruction with invariance regularization. It further introduces an adaptive analytical framework for cluster number estimation. The work unifies MI maximization, deep neural networks, and discriminative loss functions into a cohesive approach. As a key contribution, it releases GemClus—a fully open-source, reproducible Python toolkit—enabling unified support for theoretical instruction, algorithmic experimentation, and benchmark evaluation in discriminative clustering research.

Technology Category

Application Category

📝 Abstract
To cluster data is to separate samples into distinctive groups that should ideally have some cohesive properties. Today, numerous clustering algorithms exist, and their differences lie essentially in what can be perceived as ``cohesive properties''. Therefore, hypotheses on the nature of clusters must be set: they can be either generative or discriminative. As the last decade witnessed the impressive growth of deep clustering methods that involve neural networks to handle high-dimensional data often in a discriminative manner; we concentrate mainly on the discriminative hypotheses. In this paper, our aim is to provide an accessible historical perspective on the evolution of discriminative clustering methods and notably how the nature of assumptions of the discriminative models changed over time: from decision boundaries to invariance critics. We notably highlight how mutual information has been a historical cornerstone of the progress of (deep) discriminative clustering methods. We also show some known limitations of mutual information and how discriminative clustering methods tried to circumvent those. We then discuss the challenges that discriminative clustering faces with respect to the selection of the number of clusters. Finally, we showcase these techniques using the dedicated Python package, GemClus, that we have developed for discriminative clustering.
Problem

Research questions and friction points this paper is trying to address.

Exploring evolution of discriminative clustering methods
Analyzing limitations of mutual information in clustering
Addressing challenges in selecting number of clusters
Innovation

Methods, ideas, or system contributions that make the work stand out.

Focuses on discriminative clustering with neural networks
Uses mutual information for clustering advancements
Introduces GemClus Python package for implementation
🔎 Similar Papers
No similar papers found.