Identifying Hard Noise in Long-Tailed Sample Distribution

📅 2022-07-27
🏛️ European Conference on Computer Vision
📈 Citations: 26
Influential: 3
📄 PDF
🤖 AI Summary
To address the challenge of identifying and cleansing “hard noise” in long-tailed distributions, this paper introduces the novel problem of Noisy Long-Tailed Classification (NLT). We propose an iterative Hard-to-Easy (H2E) self-bootstrapping denoising framework: a distribution-invariant noise detector dynamically assesses noise hardness, while context-agnostic feature disentanglement, multi-stage reweighting, and progressive cleaning transform hard noise into easy noise. Our method jointly enhances classifier robustness and denoising accuracy on both long-tailed and balanced data. Extensive experiments on three benchmarks—ImageNet-NLT, Animal10-NLT, and Food101-NLT—demonstrate substantial improvements over state-of-the-art methods: +5.2% average accuracy under long-tailed settings, with no degradation in balanced-scenario performance.
📝 Abstract
Conventional de-noising methods rely on the assumption that all samples are independent and identically distributed, so the resultant classifier, though disturbed by noise, can still easily identify the noises as the outliers of training distribution. However, the assumption is unrealistic in large-scale data that is inevitably long-tailed. Such imbalanced training data makes a classifier less discriminative for the tail classes, whose previously"easy"noises are now turned into"hard"ones -- they are almost as outliers as the clean tail samples. We introduce this new challenge as Noisy Long-Tailed Classification (NLT). Not surprisingly, we find that most de-noising methods fail to identify the hard noises, resulting in significant performance drop on the three proposed NLT benchmarks: ImageNet-NLT, Animal10-NLT, and Food101-NLT. To this end, we design an iterative noisy learning framework called Hard-to-Easy (H2E). Our bootstrapping philosophy is to first learn a classifier as noise identifier invariant to the class and context distributional changes, reducing"hard"noises to"easy"ones, whose removal further improves the invariance. Experimental results show that our H2E outperforms state-of-the-art de-noising methods and their ablations on long-tailed settings while maintaining a stable performance on the conventional balanced settings. Datasets and codes are available at https://github.com/yxymessi/H2E-Framework
Problem

Research questions and friction points this paper is trying to address.

Identifying hard noise in long-tailed imbalanced datasets
Developing noise-robust classifiers for tail class samples
Creating frameworks that reduce hard noise to easy noise
Innovation

Methods, ideas, or system contributions that make the work stand out.

Iterative framework reduces hard noise to easy
Learns noise identifier invariant to distribution changes
Bootstrapping improves classifier by removing easy noises
🔎 Similar Papers
No similar papers found.
Xuanyu Yi
Xuanyu Yi
ByteDance Seed
3D VisionGenerative Model
Kaihua Tang
Kaihua Tang
Nanyang Technological University
Computer VisionMachine LearningArtificial Intelligence
X
Xian-Sheng Hua
Damo Academy, Alibaba Group, Hangzhou, China
J
Joo-Hwee Lim
Institute for Infocomm Research, Singapore
H
Hanwang Zhang
Nanyang Technological University, Singapore