Synthetic Data Augmentation using Pre-trained Diffusion Models for Long-tailed Food Image Classification

📅 2025-06-02

📈 Citations: 0

✨ Influential: 0

career value

202K/year

🤖 AI Summary

To address poor tail-class recognition performance caused by long-tailed distribution in food image datasets, this paper proposes a fine-tuning-free two-stage diffusion-based synthetic augmentation framework. Methodologically, it leverages pre-trained diffusion models (e.g., Stable Diffusion) and introduces a positive–negative prompt-guided conditional generation mechanism that preserves intra-class diversity while enhancing inter-class separability; a joint sampling strategy is further adopted to mitigate negative-sample confusion. Unlike conventional synthetic methods, the framework avoids costly fine-tuning, semantic distortion, and class ambiguity. Experiments on two long-tailed food benchmark datasets demonstrate significant improvements in top-1 accuracy, outperforming existing synthetic augmentation approaches.

Technology Category

Application Category

📝 Abstract

Deep learning-based food image classification enables precise identification of food categories, further facilitating accurate nutritional analysis. However, real-world food images often show a skewed distribution, with some food types being more prevalent than others. This class imbalance can be problematic, causing models to favor the majority (head) classes with overall performance degradation for the less common (tail) classes. Recently, synthetic data augmentation using diffusion-based generative models has emerged as a promising solution to address this issue. By generating high-quality synthetic images, these models can help uniformize the data distribution, potentially improving classification performance. However, existing approaches face challenges: fine-tuning-based methods need a uniformly distributed dataset, while pre-trained model-based approaches often overlook inter-class separation in synthetic data. In this paper, we propose a two-stage synthetic data augmentation framework, leveraging pre-trained diffusion models for long-tailed food classification. We generate a reference set conditioned by a positive prompt on the generation target and then select a class that shares similar features with the generation target as a negative prompt. Subsequently, we generate a synthetic augmentation set using positive and negative prompt conditions by a combined sampling strategy that promotes intra-class diversity and inter-class separation. We demonstrate the efficacy of the proposed method on two long-tailed food benchmark datasets, achieving superior performance compared to previous works in terms of top-1 accuracy.

Problem

Research questions and friction points this paper is trying to address.

Addresses class imbalance in food image classification

Improves synthetic data diversity and separation

Enhances accuracy for long-tailed food datasets

Innovation

Methods, ideas, or system contributions that make the work stand out.

Pre-trained diffusion models generate synthetic food images

Two-stage augmentation with positive and negative prompts

Combined sampling enhances intra-class diversity and separation

🔎 Similar Papers

Semantic Augmentation in Images using Language