🤖 AI Summary
To address poor tail-class recognition performance caused by long-tailed distribution in food image datasets, this paper proposes a fine-tuning-free two-stage diffusion-based synthetic augmentation framework. Methodologically, it leverages pre-trained diffusion models (e.g., Stable Diffusion) and introduces a positive–negative prompt-guided conditional generation mechanism that preserves intra-class diversity while enhancing inter-class separability; a joint sampling strategy is further adopted to mitigate negative-sample confusion. Unlike conventional synthetic methods, the framework avoids costly fine-tuning, semantic distortion, and class ambiguity. Experiments on two long-tailed food benchmark datasets demonstrate significant improvements in top-1 accuracy, outperforming existing synthetic augmentation approaches.
📝 Abstract
Deep learning-based food image classification enables precise identification of food categories, further facilitating accurate nutritional analysis. However, real-world food images often show a skewed distribution, with some food types being more prevalent than others. This class imbalance can be problematic, causing models to favor the majority (head) classes with overall performance degradation for the less common (tail) classes. Recently, synthetic data augmentation using diffusion-based generative models has emerged as a promising solution to address this issue. By generating high-quality synthetic images, these models can help uniformize the data distribution, potentially improving classification performance. However, existing approaches face challenges: fine-tuning-based methods need a uniformly distributed dataset, while pre-trained model-based approaches often overlook inter-class separation in synthetic data. In this paper, we propose a two-stage synthetic data augmentation framework, leveraging pre-trained diffusion models for long-tailed food classification. We generate a reference set conditioned by a positive prompt on the generation target and then select a class that shares similar features with the generation target as a negative prompt. Subsequently, we generate a synthetic augmentation set using positive and negative prompt conditions by a combined sampling strategy that promotes intra-class diversity and inter-class separation. We demonstrate the efficacy of the proposed method on two long-tailed food benchmark datasets, achieving superior performance compared to previous works in terms of top-1 accuracy.