Sparse clustering via the Deterministic Information Bottleneck algorithm

📅 2026-01-28
📈 Citations: 0
Influential: 0
📄 PDF

career value

261K/year
🤖 AI Summary
This work addresses the challenge that traditional clustering methods struggle to simultaneously achieve effective clustering and feature selection when applied to high-dimensional sparse data, where only a subset of features is discriminative. The study introduces the deterministic information bottleneck (DIB) into sparse clustering for the first time, proposing an information-theoretic framework that jointly optimizes feature weighting and clustering objectives. This approach automatically learns feature importance while uncovering underlying cluster structures. Extensive experiments on both synthetic and real-world genomic datasets demonstrate that the proposed method consistently outperforms existing sparse clustering algorithms, confirming its effectiveness and novelty.

Technology Category

Application Category

📝 Abstract
Cluster analysis relates to the task of assigning objects into groups which ideally present some desirable characteristics. When a cluster structure is confined to a subset of the feature space, traditional clustering techniques face unprecedented challenges. We present an information-theoretic framework that overcomes the problems associated with sparse data, allowing for joint feature weighting and clustering. Our proposal constitutes a competitive alternative to existing clustering algorithms for sparse data, as demonstrated through simulations on synthetic data. The effectiveness of our method is established by an application on a real-world genomics data set.
Problem

Research questions and friction points this paper is trying to address.

sparse clustering
feature selection
cluster analysis
high-dimensional data
information bottleneck
Innovation

Methods, ideas, or system contributions that make the work stand out.

Deterministic Information Bottleneck
sparse clustering
feature weighting
information-theoretic framework
joint clustering