Can We Treat Noisy Labels as Accurate?

📅 2024-05-21

🏛️ arXiv.org

📈 Citations: 1

✨ Influential: 0

career value

185K/year

🤖 AI Summary

Noisy labels—particularly instance-dependent noise arising from ambiguous instance features—severely degrade model generalization. Conventional transition-matrix-based label correction methods struggle to capture the complex dependencies between instances and their noisy labels. To address this, we propose a novel “trust-the-label, tune-the-feature” paradigm: instead of correcting labels, we actively edit instance features to align with noisy labels. We introduce EchoMod, a feature editing module built upon controllable generative models, and EchoSelect, a distribution-aware, threshold-adaptive sample selection mechanism that jointly optimizes alignment accuracy and data distribution consistency. Under 30% instance-dependent noise, our method achieves nearly double the sample retention rate compared to state-of-the-art approaches. Extensive experiments on three benchmark datasets demonstrate significant performance gains over existing methods.

Technology Category

Application Category

📝 Abstract

Noisy labels significantly hinder the accuracy and generalization of machine learning models, particularly due to ambiguous instance features. Traditional techniques that attempt to correct noisy labels directly, such as those using transition matrices, often fail to address the inherent complexities of the problem sufficiently. In this paper, we introduce EchoAlign, a transformative paradigm shift in learning from noisy labels. Instead of focusing on label correction, EchoAlign treats noisy labels ($ ilde{Y}$) as accurate and modifies corresponding instance features ($X$) to achieve better alignment with $ ilde{Y}$. EchoAlign's core components are (1) EchoMod: Employing controllable generative models, EchoMod precisely modifies instances while maintaining their intrinsic characteristics and ensuring alignment with the noisy labels. (2) EchoSelect: Instance modification inevitably introduces distribution shifts between training and test sets. EchoSelect maintains a significant portion of clean original instances to mitigate these shifts. It leverages the distinct feature similarity distributions between original and modified instances as a robust tool for accurate sample selection. This integrated approach yields remarkable results. In environments with 30% instance-dependent noise, even at 99% selection accuracy, EchoSelect retains nearly twice the number of samples compared to the previous best method. Notably, on three datasets, EchoAlign surpasses previous state-of-the-art techniques with a substantial improvement.

Problem

Research questions and friction points this paper is trying to address.

Addressing noisy labels from ambiguous features in machine learning

Proposing instance modification instead of label correction for noise

Mitigating distribution shifts while preserving intrinsic instance characteristics

Innovation

Methods, ideas, or system contributions that make the work stand out.

EchoAlign modifies instances to align with noisy labels

EchoMod uses generative models to preserve instance characteristics

EchoSelect retains original instances using feature similarity distributions

🔎 Similar Papers

No similar papers found.