Science-T2I: Addressing Scientific Illusions in Image Synthesis

📅 2025-04-17
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Scientific factual errors—such as violations of physical, biological, or chemical principles—are prevalent in text-to-image generation, yet reliable evaluation and optimization mechanisms for scientific fidelity remain lacking. Method: We introduce Science-T2I, the first scientific knowledge–driven image–text adversarial dataset (20K expert-annotated image–caption pairs), and propose SciScore—a CLIP-based, interpretable reward model for automated assessment of image scientific plausibility, achieving human-level consistency (+5%). We further design a two-stage training framework featuring masked online fine-tuning, integrating scientific knowledge–enhanced vision–language alignment, supervised fine-tuning, and reinforcement learning. Contribution/Results: Evaluated on the FLUX model, our approach improves SciScore by over 50%, establishing a new benchmark for evaluating scientific authenticity in generated images. This work pioneers automated, knowledge-grounded evaluation and optimization of scientific fidelity in text-to-image synthesis.

Technology Category

Application Category

📝 Abstract
We present a novel approach to integrating scientific knowledge into generative models, enhancing their realism and consistency in image synthesis. First, we introduce Science-T2I, an expert-annotated adversarial dataset comprising adversarial 20k image pairs with 9k prompts, covering wide distinct scientific knowledge categories. Leveraging Science-T2I, we present SciScore, an end-to-end reward model that refines the assessment of generated images based on scientific knowledge, which is achieved by augmenting both the scientific comprehension and visual capabilities of pre-trained CLIP model. Additionally, based on SciScore, we propose a two-stage training framework, comprising a supervised fine-tuning phase and a masked online fine-tuning phase, to incorporate scientific knowledge into existing generative models. Through comprehensive experiments, we demonstrate the effectiveness of our framework in establishing new standards for evaluating the scientific realism of generated content. Specifically, SciScore attains performance comparable to human-level, demonstrating a 5% improvement similar to evaluations conducted by experienced human evaluators. Furthermore, by applying our proposed fine-tuning method to FLUX, we achieve a performance enhancement exceeding 50% on SciScore.
Problem

Research questions and friction points this paper is trying to address.

Enhancing realism in generative image synthesis using scientific knowledge
Developing SciScore to evaluate scientific accuracy of generated images
Improving generative models via two-stage training with scientific data
Innovation

Methods, ideas, or system contributions that make the work stand out.

Expert-annotated adversarial dataset Science-T2I
End-to-end reward model SciScore enhances CLIP
Two-stage training framework for generative models
🔎 Similar Papers
No similar papers found.