Assigning Confidence: K-partition Ensembles

📅 2026-02-20
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge that existing clustering algorithms, such as k-means, lack a principled way to quantify the reliability of individual sample assignments. To this end, the authors propose CAKE, a novel framework that, for the first time, integrates consensus from cross-run cluster ensembles with local geometric support to produce an interpretable confidence score in the [0,1] interval for each data point. CAKE employs a dual-statistic fusion mechanism derived from K-partition ensembles and includes a theoretical analysis of its robustness to noise. Experimental results demonstrate that CAKE effectively identifies unstable points and ambiguous cluster boundaries, significantly enhancing both the quality and robustness of clustering outcomes on synthetic and real-world datasets.

Technology Category

Application Category

📝 Abstract
Clustering is widely used for unsupervised structure discovery, yet it offers limited insight into how reliable each individual assignment is. Diagnostics, such as convergence behavior or objective values, may reflect global quality, but they do not indicate whether particular instances are assigned confidently, especially for initialization-sensitive algorithms like k-means. This assignment-level instability can undermine both accuracy and robustness. Ensemble approaches improve global consistency by aggregating multiple runs, but they typically lack tools for quantifying pointwise confidence in a way that combines cross-run agreement with geometric support from the learned cluster structure. We introduce CAKE (Confidence in Assignments via K-partition Ensembles), a framework that evaluates each point using two complementary statistics computed over a clustering ensemble: assignment stability and consistency of local geometric fit. These are combined into a single, interpretable score in [0,1]. Our theoretical analysis shows that CAKE remains effective under noise and separates stable from unstable points. Experiments on synthetic and real-world datasets indicate that CAKE effectively highlights ambiguous points and stable core members, providing a confidence ranking that can guide filtering or prioritization to improve clustering quality.
Problem

Research questions and friction points this paper is trying to address.

clustering
confidence
assignment reliability
ensemble
k-means
Innovation

Methods, ideas, or system contributions that make the work stand out.

clustering ensemble
assignment confidence
local geometric fit
stability analysis
k-means robustness
🔎 Similar Papers
No similar papers found.
A
Aggelos Semoglou
Department of Informatics, Athens University of Economics and Business, Greece; Archimedes Research Unit, Athena Research Center, Greece
John Pavlopoulos
John Pavlopoulos
Athens University of Economics and Business
Machine LearningNLPData Science