🤖 AI Summary
This survey addresses the lack of systematic, up-to-date overviews in text-to-image (T2I) generation. We conduct a comprehensive analysis of 141 representative works published between 2021 and 2024. Methodologically, we unify and categorize four foundational architectures—autoregressive, non-autoregressive, GAN-based, and diffusion models—and integrate emerging directions including Mamba, multimodal modeling, and energy-based models. We establish a multidimensional comparative framework covering generation/editing paradigms, evaluation metrics, training resource requirements, and inference efficiency. Our contributions include: (1) the first governance framework jointly addressing technical evolution and societal impact; (2) identification of performance-enhancing commonalities—e.g., classifier-free guidance and joint attention-encoder design; and (3) release of the most comprehensive reproducible benchmark and technology roadmap to date, providing structured guidance for future research.
📝 Abstract
Text-to-image generation (T2I) refers to the text-guided generation of high-quality images. In the past few years, T2I has attracted widespread attention and numerous works have emerged. In this survey, we comprehensively review 141 works conducted from 2021 to 2024. First, we introduce four foundation model architectures of T2I (autoregression, non-autoregression, GAN and diffusion) and the commonly used key technologies (autoencoder, attention and classifier-free guidance). Secondly, we systematically compare the methods of these studies in two directions, T2I generation and T2I editing, including the encoders and the key technologies they use. In addition, we also compare the performance of these researches side by side in terms of datasets, evaluation metrics, training resources, and inference speed. In addition to the four foundation models, we survey other works on T2I, such as energy-based models and recent Mamba and multimodality. We also investigate the potential social impact of T2I and provide some solutions. Finally, we propose unique insights of improving the performance of T2I models and possible future development directions. In summary, this survey is the first systematic and comprehensive overview of T2I, aiming to provide a valuable guide for future researchers and stimulate continued progress in this field.