🤖 AI Summary
Clustering mixed-type tabular data faces challenges such as inconsistent representations between numerical and categorical features, context-dependent feature importance, and a disconnect between clustering and interpretability. This work proposes WISE, a novel unsupervised framework that unifies heterogeneous feature alignment, multi-view feature weighting, clustering, and intrinsic interpretability into a single self-explainable pipeline. WISE employs Binary Encoding with Padding (BEP) and Leave-One-Feature-Out (LOFO) strategies to learn feature representations and weights, followed by a two-stage weight-aware clustering procedure and the extraction of Discriminative Frequent Itemsets (DFIs) for cluster-level explanations. Evaluated on six real-world datasets, WISE significantly outperforms both classical and neural baselines, delivering faithful, human-interpretable explanations while maintaining computational efficiency and offering additive decomposition guarantees.
📝 Abstract
Clustering mixed-type tabular data is fundamental for exploratory analysis, yet remains challenging due to misaligned numerical-categorical representations, uneven and context-dependent feature relevance, and disconnected and post-hoc explanation from the clustering process. We propose WISE, a Weight-Informed Self-Explaining framework that unifies representation, feature weighting, clustering, and interpretation in a fully unsupervised and transparent pipeline. WISE introduces Binary Encoding with Padding (BEP) to align heterogeneous features in a unified sparse space, a Leave-One-Feature-Out (LOFO) strategy to sense multiple high-quality and diverse feature-weighting views, and a two-stage weight-aware clustering procedure to aggregate alternative semantic partitions. To ensure intrinsic interpretability, we further develop Discriminative FreqItems (DFI), which yields feature-level explanations that are consistent from instances to clusters with an additive decomposition guarantee. Extensive experiments on six real-world datasets demonstrate that WISE consistently outperforms classical and neural baselines in clustering quality while remaining efficient, and produces faithful, human-interpretable explanations grounded in the same primitives that drive clustering.