When Pretty Isn't Useful: Investigating Why Modern Text-to-Image Models Fail as Reliable Training Data Generators

📅 2026-02-23

📈 Citations: 0

✨ Influential: 0

career value

191K/year

🤖 AI Summary

Although text-to-image (T2I) diffusion models from 2022 to 2025 have made remarkable progress in generating visually realistic and prompt-aligned images, their synthetic data consistently underperforms when used to train image classifiers. This study systematically evaluates the efficacy of data generated by successive generations of state-of-the-art T2I models through large-scale synthesis, standard classifier training protocols, and cross-model comparative analysis. The findings reveal that the pursuit of aesthetic quality has come at the cost of reduced data diversity and label consistency, leading to a disconnection between “generative realism” and “data utility.” Experiments demonstrate that classifiers trained on synthetic data from the latest T2I models exhibit significantly degraded accuracy on real-world test sets, indicating that current T2I-generated data is unsuitable as a reliable source for training robust image classifiers.

Technology Category

Application Category

📝 Abstract

Recent text-to-image (T2I) diffusion models produce visually stunning images and demonstrate excellent prompt following. But do they perform well as synthetic vision data generators? In this work, we revisit the promise of synthetic data as a scalable substitute for real training sets and uncover a surprising performance regression. We generate large-scale synthetic datasets using state-of-the-art T2I models released between 2022 and 2025, train standard classifiers solely on this synthetic data, and evaluate them on real test data. Despite observable advances in visual fidelity and prompt adherence, classification accuracy on real test data consistently declines with newer T2I models as training data generators. Our analysis reveals a hidden trend: These models collapse to a narrow, aesthetic-centric distribution that undermines diversity and label-image alignment. Overall, our findings challenge a growing assumption in vision research, namely that progress in generative realism implies progress in data realism. We thus highlight an urgent need to rethink the capabilities of modern T2I models as reliable training data generators.

Problem

Research questions and friction points this paper is trying to address.

text-to-image models

synthetic data

training data generation

data realism

classification accuracy

Innovation

Methods, ideas, or system contributions that make the work stand out.

synthetic data

text-to-image models

data realism