Beyond Objects: Contextual Synthetic Data Generation for Fine-Grained Classification

📅 2025-10-28
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the few-shot problem in fine-grained image classification, this paper proposes a context-aware synthetic data generation framework. The method leverages text-to-image diffusion models, enhanced by two key innovations: (1) a Background-Object-Bias (BOB) training strategy that explicitly disentangles and models class-agnostic attributes—such as background and pose—thereby preserving generative priors during model fine-tuning and mitigating overfitting and inter-class confusion; and (2) context feature extraction from limited real samples, followed by conditional fine-tuning and marginalization during sampling to improve the semantic richness, diversity, and representativeness of generated images. Evaluated across 24 experimental settings, the approach achieves state-of-the-art performance in 18. On the Aircraft dataset, integrating only five real images with synthesized data yields a +7.4% accuracy gain—surpassing the baseline trained on ten real images.

Technology Category

Application Category

📝 Abstract
Text-to-image (T2I) models are increasingly used for synthetic dataset generation, but generating effective synthetic training data for classification remains challenging. Fine-tuning a T2I model with a few real examples can help improve the quality of synthetic training data; however, it may also cause overfitting and reduce diversity in the generated samples. We propose a fine-tuning strategy BOB (BeyondOBjects) to mitigate these concerns for fine-grained classification. Given a small set of real examples, we first extract class-agnostic attributes such as scene background and object pose. We then explicitly condition on these attributes during fine-tuning of the T2I model and marginalize them out during generation. This design mitigates overfitting, preserves the T2I model's generative prior, reduces estimation errors, and further minimizes unintended inter-class associations. Extensive experiments across multiple T2I models, backbones, and datasets show that our method achieves state-of-the-art performance in low-shot fine-grained classification when augmented with synthetic data. Concretely, BOB outperforms DataDream by 7.4% on the Aircraft dataset (from 50.0% to 57.4% when fine-tuning a CLIP classifier with five real images augmented with 100 synthetic images). In three of the four benchmarks, fine-tuning downstream models with 5 real images augmented with BOB achieves better performance than fine-tuning with 10 real images. Collectively, BOB outperforms prior art in 18 of 24 experimental settings, with 2+% accuracy improvements in 14 of these settings.
Problem

Research questions and friction points this paper is trying to address.

Generating diverse synthetic data for fine-grained classification
Mitigating overfitting in text-to-image model fine-tuning
Improving low-shot classification accuracy with contextual attributes
Innovation

Methods, ideas, or system contributions that make the work stand out.

Extracts class-agnostic attributes like background and pose
Conditions T2I fine-tuning on attributes but marginalizes during generation
Mitigates overfitting and preserves generative prior for diversity
🔎 Similar Papers
No similar papers found.