Active Learning with Task-Driven Representations for Messy Pools

📅 2025-10-29

📈 Citations: 0

✨ Influential: 0

career value

176K/year

🤖 AI Summary

Active learning performance degrades significantly in uncurated data pools due to severe label noise and heterogeneous sample correlations. Method: This paper proposes a task-driven dynamic representation learning framework that, unlike existing approaches relying on fixed unsupervised or pretrained representations, periodically updates task-specific feature representations during active learning iterations—via semi-supervised learning or supervised fine-tuning—to better capture discriminative information. Contribution/Results: The core innovation lies in jointly optimizing representation learning and query selection, enabling a closed-loop evolution of representation–labeling–learning. Extensive experiments on multiple messy-pool benchmarks demonstrate that the proposed method consistently outperforms static-representation baselines, achieving simultaneous improvements in labeling efficiency and final model accuracy.

Technology Category

Application Category

📝 Abstract

Active learning has the potential to be especially useful for messy, uncurated pools where datapoints vary in relevance to the target task. However, state-of-the-art approaches to this problem currently rely on using fixed, unsupervised representations of the pool, focusing on modifying the acquisition function instead. We show that this model setup can undermine their effectiveness at dealing with messy pools, as such representations can fail to capture important information relevant to the task. To address this, we propose using task-driven representations that are periodically updated during the active learning process using the previously collected labels. We introduce two specific strategies for learning these representations, one based on directly learning semi-supervised representations and the other based on supervised fine-tuning of an initial unsupervised representation. We find that both significantly improve empirical performance over using unsupervised or pretrained representations.

Problem

Research questions and friction points this paper is trying to address.

Active learning struggles with messy uncurated data pools

Fixed unsupervised representations fail to capture task-relevant information

Task-driven representations need periodic updates during active learning

Innovation

Methods, ideas, or system contributions that make the work stand out.

Active learning uses task-driven representations updated periodically

Two strategies: semi-supervised learning and supervised fine-tuning

Both methods outperform unsupervised or pretrained representations

🔎 Similar Papers

No similar papers found.