🤖 AI Summary
Real-world data often exhibit hierarchical structure, yet existing deep hierarchical clustering methods suffer from poor scalability and limited performance. This paper proposes a lightweight, fine-tuning-free post-hoc framework that directly constructs high-quality hierarchical trees from logits produced by arbitrary pre-trained models—including unsupervised clusterers or ImageNet classifiers. Methodologically, it introduces two key components: (i) a logit-based spectral clustering variant, and (ii) a gradient-free hierarchical agglomerative algorithm that reconstructs similarity in feature space via logit distillation. Crucially, we provide the first theoretical and empirical evidence that logit distillation outperforms complex end-to-end hierarchical modeling. Our approach surpasses dedicated deep hierarchical clustering models across multiple benchmarks, reduces computational overhead by 10×, and—uniquely—enables通用, semantically consistent hierarchical discovery across both unsupervised and supervised settings.
📝 Abstract
The structure of many real-world datasets is intrinsically hierarchical, making the modeling of such hierarchies a critical objective in both unsupervised and supervised machine learning. Recently, novel approaches for hierarchical clustering with deep architectures have been proposed. In this work, we take a critical perspective on this line of research and demonstrate that many approaches exhibit major limitations when applied to realistic datasets, partly due to their high computational complexity. In particular, we show that a lightweight procedure implemented on top of pre-trained non-hierarchical clustering models outperforms models designed specifically for hierarchical clustering. Our proposed approach is computationally efficient and applicable to any pre-trained clustering model that outputs logits, without requiring any fine-tuning. To highlight the generality of our findings, we illustrate how our method can also be applied in a supervised setup, recovering meaningful hierarchies from a pre-trained ImageNet classifier.