Noisy Label Refinement with Semantically Reliable Synthetic Images

📅 2025-09-04

📈 Citations: 0

✨ Influential: 0

career value

191K/year

🤖 AI Summary

Semantic label noise arising from visual similarity is prevalent in image classification datasets and severely degrades supervised model performance. To address this, we propose a synthetic-image-guided label refinement framework: leveraging state-of-the-art text-to-image diffusion models (e.g., Stable Diffusion), we generate semantically faithful and class-pure synthetic images as reliable reference anchors; these anchors enable cross-domain knowledge-guided noise detection and label correction. Our method is framework-agnostic—requiring no modification to underlying robust learning architectures—and operates as a plug-and-play module. Extensive experiments on CIFAR-10, CIFAR-100, and ImageNet-100 demonstrate absolute accuracy improvements of 30.0%, 11.2%, and 24.5%, respectively, substantially outperforming existing label-noise mitigation approaches. To our knowledge, this is the first work to systematically leverage synthetic semantic priors to enhance the label quality of real-world datasets.

Technology Category

Application Category

📝 Abstract

Semantic noise in image classification datasets, where visually similar categories are frequently mislabeled, poses a significant challenge to conventional supervised learning approaches. In this paper, we explore the potential of using synthetic images generated by advanced text-to-image models to address this issue. Although these high-quality synthetic images come with reliable labels, their direct application in training is limited by domain gaps and diversity constraints. Unlike conventional approaches, we propose a novel method that leverages synthetic images as reliable reference points to identify and correct mislabeled samples in noisy datasets. Extensive experiments across multiple benchmark datasets show that our approach significantly improves classification accuracy under various noise conditions, especially in challenging scenarios with semantic label noise. Additionally, since our method is orthogonal to existing noise-robust learning techniques, when combined with state-of-the-art noise-robust training methods, it achieves superior performance, improving accuracy by 30% on CIFAR-10 and by 11% on CIFAR-100 under 70% semantic noise, and by 24% on ImageNet-100 under real-world noise conditions.

Problem

Research questions and friction points this paper is trying to address.

Addressing semantic label noise in image classification datasets

Leveraging synthetic images to correct mislabeled samples

Improving classification accuracy under various noise conditions

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses synthetic images as reliable reference points

Identifies and corrects mislabeled samples in datasets

Combines with existing noise-robust training methods

🔎 Similar Papers

Pseudo-labelling meets Label Smoothing for Noisy Partial Label Learning