Hierarchical Linkage Clustering Beyond Binary Trees and Ultrametrics

📅 2025-11-22
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Traditional hierarchical clustering suffers from three fundamental limitations: (i) it enforces a hierarchical output even when the underlying data lacks nested structure; (ii) it restricts the hierarchy to binary trees; and (iii) it is highly sensitive to the choice of linkage function. This paper introduces the notion of an “effective hierarchy” equipped with a partial order, and defines the “finest effective hierarchy” as the unique characterization of the intrinsic nested structure in data—supporting non-binary topologies and naturally degenerating to a star tree when no genuine hierarchy exists. Our method adopts a two-stage framework: first constructing a classical binary hierarchical tree, then pruning it according to validity criteria, provably recovering the finest effective hierarchy. Theoretically grounded, this framework unifies single-, complete-, and average-linkage results, while revealing that Ward linkage fails the recovery condition. By maximizing consistent information encoding of similarity structure, our approach overcomes both structural rigidity and robustness limitations of conventional methods.

Technology Category

Application Category

📝 Abstract
Hierarchical clustering seeks to uncover nested structures in data by constructing a tree of clusters, where deeper levels reveal finer-grained relationships. Traditional methods, including linkage approaches, face three major limitations: (i) they always return a hierarchy, even if none exists, (ii) they are restricted to binary trees, even if the true hierarchy is non-binary, and (iii) they are highly sensitive to the choice of linkage function. In this paper, we address these issues by introducing the notion of a valid hierarchy and defining a partial order over the set of valid hierarchies. We prove the existence of a finest valid hierarchy, that is, the hierarchy that encodes the maximum information consistent with the similarity structure of the data set. In particular, the finest valid hierarchy is not constrained to binary structures and, when no hierarchical relationships exist, collapses to a star tree. We propose a simple two-step algorithm that first constructs a binary tree via a linkage method and then prunes it to enforce validity. We establish necessary and sufficient conditions on the linkage function under which this procedure exactly recovers the finest valid hierarchy, and we show that all linkage functions satisfying these conditions yield the same hierarchy after pruning. Notably, classical linkage rules such as single, complete, and average satisfy these conditions, whereas Ward's linkage fails to do so.
Problem

Research questions and friction points this paper is trying to address.

Overcoming binary tree restrictions in hierarchical clustering methods
Addressing sensitivity to linkage function choice in clustering algorithms
Providing valid hierarchy construction when no true hierarchy exists
Innovation

Methods, ideas, or system contributions that make the work stand out.

Introduces valid hierarchy concept beyond binary trees
Proposes two-step algorithm with pruning for validity
Identifies linkage conditions ensuring finest hierarchy recovery
🔎 Similar Papers
No similar papers found.