π€ AI Summary
Current text-to-image models exhibit limited performance on prompts requiring world knowledge and implicit reasoning, hindering their real-world applicability. To address this, we introduce the first reasoning-driven benchmark integrating humanities and natural science knowledge, accompanied by the Knowledge Checklist Scoreβa novel quantitative metric systematically evaluating cross-domain implicit reasoning capabilities, including counterfactual, causal, and cultural metaphorical reasoning. Methodologically, we propose a semantic-consistency-based knowledge assessment framework coupled with multi-dimensional prompt design, enabling a comprehensive horizontal evaluation of 21 state-of-the-art models. Empirical results reveal that closed-source autoregressive models (e.g., GPT-4o) significantly outperform open-source diffusion models in knowledge integration and logical reasoning. This work establishes a reproducible evaluation standard and identifies key improvement pathways toward cognitively enhanced text-to-image systems.
π Abstract
Recent advances in text-to-image (T2I) generation have achieved impressive results, yet existing models still struggle with prompts that require rich world knowledge and implicit reasoning: both of which are critical for producing semantically accurate, coherent, and contextually appropriate images in real-world scenarios. To address this gap, we introduce extbf{WorldGenBench}, a benchmark designed to systematically evaluate T2I models' world knowledge grounding and implicit inferential capabilities, covering both the humanities and nature domains. We propose the extbf{Knowledge Checklist Score}, a structured metric that measures how well generated images satisfy key semantic expectations. Experiments across 21 state-of-the-art models reveal that while diffusion models lead among open-source methods, proprietary auto-regressive models like GPT-4o exhibit significantly stronger reasoning and knowledge integration. Our findings highlight the need for deeper understanding and inference capabilities in next-generation T2I systems. Project Page: href{https://dwanzhang-ai.github.io/WorldGenBench/}{https://dwanzhang-ai.github.io/WorldGenBench/}