SIQA: Toward Reliable Scientific Image Quality Assessment

📅 2026-03-05
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing image quality assessment methods overlook the core requirements of scientific images—namely, scientific correctness and logical completeness—and instead focus narrowly on perceptual fidelity or text-image alignment. This work proposes SIQA, the first multidimensional quality assessment framework tailored for scientific images, which systematically decomposes quality into a knowledge dimension (scientific validity and completeness) and a perceptual dimension (cognitive clarity and disciplinary normativity). Two evaluation protocols are introduced: a multiple-choice semantic understanding task (SIQA-U) and an expert-score alignment task (SIQA-S). Leveraging multimodal large language models and an expert-annotated benchmark, experiments reveal that current models achieve strong score alignment (SIQA-S) but exhibit significant deficiencies in semantic understanding (SIQA-U), with fine-tuning yielding only marginal gains in comprehension. These findings underscore the necessity of multidimensional evaluation and validate the effectiveness of SIQA.

Technology Category

Application Category

📝 Abstract
Scientific images fundamentally differ from natural and AI-generated images in that they encode structured domain knowledge rather than merely depict visual scenes. Assessing their quality therefore requires evaluating not only perceptual fidelity but also scientific correctness and logical completeness. However, existing image quality assessment (IQA) paradigms primarily focus on perceptual distortions or image-text alignment, implicitly assuming that depicted content is factually valid. This assumption breaks down in scientific contexts, where visually plausible figures may still contain conceptual errors or incomplete reasoning. To address this gap, we introduce Scientific Image Quality Assessment (SIQA), a framework that models scientific image quality along two complementary dimensions: Knowledge (Scientific Validity and Scientific Completeness) and Perception (Cognitive Clarity and Disciplinary Conformity). To operationalize this formulation, we design two evaluation protocols: SIQA-U (Understanding), which measures semantic comprehension of scientific content through multiple-choice tasks, and SIQA-S (Scoring), which evaluates alignment with expert quality judgments. We further construct the SIQA Challenge, consisting of an expert-annotated benchmark and a large-scale training set. Experiments across representative multimodal large language models (MLLMs) reveal a consistent discrepancy between scoring alignment and scientific understanding. While models can achieve strong agreement with expert ratings under SIQA-S, their performance on SIQA-U remains substantially lower. Fine-tuning improves both metrics, yet gains in scoring consistently outpace improvements in understanding. These results suggest that rating consistency alone may not reliably reflect scientific comprehension, underscoring the necessity of multidimensional evaluation for scientific image quality assessment.
Problem

Research questions and friction points this paper is trying to address.

Scientific Image Quality Assessment
Scientific Validity
Scientific Completeness
Perceptual Fidelity
Multimodal Evaluation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Scientific Image Quality Assessment
Multidimensional Evaluation
Scientific Validity
Multimodal LLMs
Expert-Annotated Benchmark
🔎 Similar Papers
No similar papers found.