Knowledge Visualization: A Benchmark and Method for Knowledge-Intensive Text-to-Image Generation

📅 2026-04-24

📈 Citations: 0

✨ Influential: 0

career value

183K/year

🤖 AI Summary

This study addresses the challenge that existing text-to-image generation models struggle to simultaneously ensure scientific accuracy and visual plausibility in knowledge-intensive scenarios, often violating domain-specific knowledge, structural constraints, and symbolic conventions. The work presents the first systematic formulation and evaluation of this task, introducing KVBench—a knowledge visualization benchmark comprising 1,800 expert-annotated prompts spanning six high school disciplines. To enhance scientific fidelity, the authors propose KE-Check, a two-stage framework that refines prompt semantics through knowledge enrichment and guides image editing via a structured checklist. Experiments demonstrate that KE-Check effectively mitigates scientific hallucinations and significantly outperforms baseline methods in logical reasoning, symbolic precision, and multilingual robustness, while notably narrowing the performance gap between open-source and closed-source models.

Technology Category

Application Category

📝 Abstract

Recent text-to-image (T2I) models have demonstrated impressive capabilities in photorealistic synthesis and instruction following. However, their reliability in knowledge-intensive settings remains largely unexplored. Unlike natural image generation, knowledge visualization requires not only semantic alignment but also strict adherence to domain knowledge, structural constraints, and symbolic conventions, exposing a critical gap between visual plausibility and scientific correctness. To systematically study this problem, we introduce KVBench, a curriculum-grounded benchmark for evaluating knowledge-intensive T2I generation. KVBench covers six senior high-school subjects: Biology, Chemistry, Geography, History, Mathematics, and Physics. The benchmark consists of 1,800 expert-curated prompts derived from over 30 authoritative textbooks. Using this benchmark, we evaluate 14 state-of-the-art open- and closed-source models, revealing substantial deficiencies in logical reasoning, symbolic precision, and multilingual robustness, with open-source models consistently underperforming proprietary systems. To address these limitations, we further propose KE-Check, a two-stage framework that improves scientific fidelity via (1) Knowledge Elaboration for structured prompt enrichment, and (2) Checklist-Guided Refinement for explicit constraint enforcement through violation identification and constraint-guided editing. KE-Check effectively mitigates scientific hallucinations, narrowing the performance gap between open-source and leading closed-source models. Data and codes are publicly available at https://github.com/zhaoran66/KVBench.

Problem

Research questions and friction points this paper is trying to address.

knowledge-intensive

text-to-image generation

scientific correctness

knowledge visualization

symbolic precision

Innovation

Methods, ideas, or system contributions that make the work stand out.

knowledge visualization

text-to-image generation

KVBench