🤖 AI Summary
This study systematically evaluates the joint demographic representational fidelity of text-to-image models—specifically regarding age, nationality, and gender—in synthetic portrait generation. Method: We construct a structured, multidimensional prompt library spanning 30 age groups (10–78 years), 212 countries, and balanced gender combinations; generate portraits using Stable Diffusion and DALL·E 3; and objectively assess age accuracy via FaceNet+DeepAge and DEX benchmarks. Contribution/Results: This work presents the first large-scale, cross-national, cross-age, multi-model benchmark for demographic fidelity. Results reveal a mean absolute age estimation error of ±8.2 years, with pronounced overestimation for adolescents and underestimation for older adults. Significant nationality–gender interaction effects further amplify demographic distortions. The findings expose critical reliability limitations in high-stakes applications (e.g., forensics, healthcare) and propose a data trustworthiness tiered application framework—establishing a new benchmark for demographic fidelity assessment in synthetic imagery.
📝 Abstract
Text-to-image generative models have shown remarkable progress in producing diverse and photorealistic outputs. In this paper, we present a comprehensive analysis of their effectiveness in creating synthetic portraits that accurately represent various demographic attributes, with a special focus on age, nationality, and gender. Our evaluation employs prompts specifying detailed profiles (e.g., Photorealistic selfie photo of a 32-year-old Canadian male), covering a broad spectrum of 212 nationalities, 30 distinct ages from 10 to 78, and balanced gender representation. We compare the generated images against ground truth age estimates from two established age estimation models to assess how faithfully age is depicted. Our findings reveal that although text-to-image models can consistently generate faces reflecting different identities, the accuracy with which they capture specific ages and do so across diverse demographic backgrounds remains highly variable. These results suggest that current synthetic data may be insufficiently reliable for high-stakes age-related tasks requiring robust precision, unless practitioners are prepared to invest in significant filtering and curation. Nevertheless, they may still be useful in less sensitive or exploratory applications, where absolute age precision is not critical.