Sample Selection via Contrastive Fragmentation for Noisy Label Regression

📅 2025-02-25

🏛️ Neural Information Processing Systems

📈 Citations: 0

✨ Influential: 0

career value

198K/year

🤖 AI Summary

To address performance degradation in regression tasks caused by label noise, this paper proposes ConFrag—a novel framework that, for the first time, integrates contrastive learning with fragmented feature modeling. Leveraging the continuous, ordinal correlation between labels and features, ConFrag constructs contrastive fragmented pairs to enhance representation discriminability. It introduces two key innovations: (i) a multi-expert neighborhood consistency criterion and (ii) an error residual ratio (ERR) metric, enabling robust identification of clean samples. The method unifies contrastive learning, fragmented representation, multi-expert ensembling, and noise-robust training. Evaluated on six cross-domain noisy regression benchmarks, ConFrag consistently outperforms 14 state-of-the-art methods under both symmetric and Gaussian random label noise, demonstrating superior robustness. This work establishes a new paradigm for noise-robust regression.

Technology Category

Application Category

📝 Abstract

As with many other problems, real-world regression is plagued by the presence of noisy labels, an inevitable issue that demands our attention. Fortunately, much real-world data often exhibits an intrinsic property of continuously ordered correlations between labels and features, where data points with similar labels are also represented with closely related features. In response, we propose a novel approach named ConFrag, where we collectively model the regression data by transforming them into disjoint yet contrasting fragmentation pairs. This enables the training of more distinctive representations, enhancing the ability to select clean samples. Our ConFrag framework leverages a mixture of neighboring fragments to discern noisy labels through neighborhood agreement among expert feature extractors. We extensively perform experiments on six newly curated benchmark datasets of diverse domains, including age prediction, price prediction, and music production year estimation. We also introduce a metric called Error Residual Ratio (ERR) to better account for varying degrees of label noise. Our approach consistently outperforms fourteen state-of-the-art baselines, being robust against symmetric and random Gaussian label noise.

Problem

Research questions and friction points this paper is trying to address.

Address noisy labels in regression tasks.

Enhance sample selection via contrasting fragmentation.

Improve robustness against symmetric and Gaussian noise.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Contrastive fragmentation for noisy label regression

Neighborhood agreement among expert feature extractors

Error Residual Ratio metric for label noise

🔎 Similar Papers

Can We Treat Noisy Labels as Accurate?