Classifying Long-tailed and Label-noise Data via Disentangling and Unlearning

📅 2025-03-14

📈 Citations: 0

✨ Influential: 0

career value

220K/year

🤖 AI Summary

In real-world scenarios, long-tailed label distributions frequently co-occur with label noise, exhibiting strong coupling: tail-class samples are disproportionately mislabeled as head-class ones (“tail-to-head”, T2H noise), causing head-class contamination and severe tail-class recognition bias. This work is the first to systematically characterize and model the T2H noise mechanism, proposing the Dynamic Uncoupling and Unlearning (DULL) framework. Its core contributions are: (1) Intra-feature Decoupling (IFD), which disentangles class-agnostic from class-specific features; (2) Intra-feature Partial Unlearning (IFPU), which selectively attenuates erroneously class-associated features; and (3) a learnable T2H noise simulation algorithm. Evaluated on both synthetic and real-world long-tailed noisy datasets, DULL significantly outperforms existing long-tailed noisy label learning (LTNLL) methods—effectively suppressing noise propagation, improving tail-class accuracy, and enhancing robustness to joint distribution shifts and label noise.

Technology Category

Application Category

📝 Abstract

In real-world datasets, the challenges of long-tailed distributions and noisy labels often coexist, posing obstacles to the model training and performance. Existing studies on long-tailed noisy label learning (LTNLL) typically assume that the generation of noisy labels is independent of the long-tailed distribution, which may not be true from a practical perspective. In real-world situaiton, we observe that the tail class samples are more likely to be mislabeled as head, exacerbating the original degree of imbalance. We call this phenomenon as ``tail-to-head (T2H)'' noise. T2H noise severely degrades model performance by polluting the head classes and forcing the model to learn the tail samples as head. To address this challenge, we investigate the dynamic misleading process of the nosiy labels and propose a novel method called Disentangling and Unlearning for Long-tailed and Label-noisy data (DULL). It first employs the Inner-Feature Disentangling (IFD) to disentangle feature internally. Based on this, the Inner-Feature Partial Unlearning (IFPU) is then applied to weaken and unlearn incorrect feature regions correlated to wrong classes. This method prevents the model from being misled by noisy labels, enhancing the model's robustness against noise. To provide a controlled experimental environment, we further propose a new noise addition algorithm to simulate T2H noise. Extensive experiments on both simulated and real-world datasets demonstrate the effectiveness of our proposed method.

Problem

Research questions and friction points this paper is trying to address.

Addresses long-tailed and noisy label data challenges

Proposes method to disentangle and unlearn incorrect features

Simulates and mitigates tail-to-head noise impact

Innovation

Methods, ideas, or system contributions that make the work stand out.

Disentangling features internally using IFD

Unlearning incorrect features via IFPU

Simulating T2H noise for controlled experiments

🔎 Similar Papers

No similar papers found.