Information-Theoretic Active Correlation Clustering

📅 2024-02-05
📈 Citations: 1
Influential: 0
📄 PDF
🤖 AI Summary
To address the problem of relevance clustering under scarce pairwise similarity information and high query costs, this paper proposes an active learning–based clustering method. Our approach systematically models information-theoretic criteria—entropy and information gain—as pairwise similarity query functions, enabling adaptive selection of the most discriminative sample pairs. Unlike prior methods, it operates without assuming a predefined similarity graph structure and jointly optimizes the graph partitioning objective in an end-to-end manner under a strict query budget. Experiments across multiple benchmark datasets demonstrate that our method achieves higher clustering accuracy (improving by 4.2–9.8 percentage points) with significantly fewer queries (reducing query counts by 35%–50% on average) compared to both random querying and state-of-the-art active clustering baselines. These results validate its superior trade-off between annotation efficiency and clustering performance.

Technology Category

Application Category

📝 Abstract
We study correlation clustering where the pairwise similarities are not known in advance. For this purpose, we employ active learning to query pairwise similarities in a cost-efficient way. We propose a number of effective information-theoretic acquisition functions based on entropy and information gain. We extensively investigate the performance of our methods in different settings and demonstrate their superior performance compared to the alternatives.
Problem

Research questions and friction points this paper is trying to address.

Develop active learning for correlation clustering with costly pairwise similarities
Introduce information-theoretic acquisition functions to reduce clustering uncertainty
Improve clustering accuracy and query efficiency under budget constraints
Innovation

Methods, ideas, or system contributions that make the work stand out.

Active learning queries pairwise comparisons efficiently
Information-theoretic acquisition functions prioritize entropy and gain
Reduces clustering uncertainty under budget constraints effectively