Hierarchical clustering with maximum density paths and mixture models

📅 2025-03-19
📈 Citations: 0
✹ Influential: 0
📄 PDF
đŸ€– AI Summary
In high-dimensional data lacking prominent density gaps, conventional hierarchical clustering struggles to uncover nested cluster structures. This paper proposes a two-stage density-guided hierarchical clustering framework: first, over-clustering via Gaussian or Student’s t mixture models to obtain fine-grained initial clusters; second, constructing maximum-density paths on the induced density field to guide bottom-up agglomerative merging. Our approach is the first to deeply integrate parametric mixture-model density estimation with density-path-driven agglomeration, achieving both scale adaptivity and semantic interpretability. Evaluated on multiple high-dimensional benchmark datasets, it attains state-of-the-art clustering performance—producing well-nested, structurally coherent dendrograms. The implementation is publicly available.

Technology Category

Application Category

📝 Abstract
Hierarchical clustering is an effective and interpretable technique for analyzing structure in data, offering a nuanced understanding by revealing insights at multiple scales and resolutions. It is particularly helpful in settings where the exact number of clusters is unknown, and provides a robust framework for exploring complex datasets. Additionally, hierarchical clustering can uncover inner structures within clusters, capturing subtle relationships and nested patterns that may be obscured by traditional flat clustering methods. However, existing hierarchical clustering methods struggle with high-dimensional data, especially when there are no clear density gaps between modes. Our method addresses this limitation by leveraging a two-stage approach, first employing a Gaussian or Student's t mixture model to overcluster the data, and then hierarchically merging clusters based on the induced density landscape. This approach yields state-of-the-art clustering performance while also providing a meaningful hierarchy, making it a valuable tool for exploratory data analysis. Code is available at https://github.com/ecker-lab/tneb clustering.
Problem

Research questions and friction points this paper is trying to address.

Addresses hierarchical clustering challenges in high-dimensional data
Overcomes unclear density gaps between modes in clustering
Enhances clustering performance with a two-stage hierarchical approach
Innovation

Methods, ideas, or system contributions that make the work stand out.

Two-stage hierarchical clustering approach
Gaussian or Student's t mixture model
Hierarchical merging based on density landscape
🔎 Similar Papers
No similar papers found.
Martin Ritzert
Martin Ritzert
Georg-August UniversitÀt Göttingen
Theoretical Machine LearningGraph LearningClusteringComplexityLogic
P
Polina Turishcheva
Department of Computer Science, University of Göttingen, Göttingen, Germany
L
Laura Hansel
Department of Computer Science, University of Göttingen, Göttingen, Germany
P
Paul Wollenhaupt
Department of Computer Science, University of Göttingen, Göttingen, Germany
M
Marissa Weis
Department of Computer Science, University of Göttingen, Göttingen, Germany
Alexander S. Ecker
Alexander S. Ecker
University of Göttingen, Germany
Computational NeuroscienceVisionMachine LearningComputer VisionData Science