SciFlow-Bench: Evaluating Structure-Aware Scientific Diagram Generation via Inverse Parsing

📅 2026-02-10

📈 Citations: 0

✨ Influential: 0

career value

185K/year

🤖 AI Summary

Current text-to-image models often produce visually plausible scientific figures that nonetheless contain structural inaccuracies, and there is a lack of structure-aware evaluation methods for pixel-level outputs. To address this gap, this work proposes SciFlow-Bench, a novel benchmark that introduces a structure-recoverability-centered evaluation paradigm. It constructs a closed-loop assessment framework by extracting figures and their corresponding ground-truth graph structures from real scientific PDFs. A hierarchical multi-agent system is employed to reverse-engineer the generated images and reconstruct their underlying structures, enabling black-box evaluation of structural fidelity rather than mere visual similarity. Experiments demonstrate that existing models struggle to preserve structural correctness in complex topological figures, underscoring the necessity of this evaluation framework and establishing a reliable benchmark for future research.

Technology Category

Application Category

📝 Abstract

Scientific diagrams convey explicit structural information, yet modern text-to-image models often produce visually plausible but structurally incorrect results. Existing benchmarks either rely on image-centric or subjective metrics insensitive to structure, or evaluate intermediate symbolic representations rather than final rendered images, leaving pixel-based diagram generation underexplored. We introduce SciFlow-Bench, a structure-first benchmark for evaluating scientific diagram generation directly from pixel-level outputs. Built from real scientific PDFs, SciFlow-Bench pairs each source framework figure with a canonical ground-truth graph and evaluates models as black-box image generators under a closed-loop, round-trip protocol that inverse-parses generated diagram images back into structured graphs for comparison. This design enforces evaluation by structural recoverability rather than visual similarity alone, and is enabled by a hierarchical multi-agent system that coordinates planning, perception, and structural reasoning. Experiments show that preserving structural correctness remains a fundamental challenge, particularly for diagrams with complex topology, underscoring the need for structure-aware evaluation.

Problem

Research questions and friction points this paper is trying to address.

scientific diagram generation

structural correctness

evaluation benchmark

pixel-level output

inverse parsing

Innovation

Methods, ideas, or system contributions that make the work stand out.

structure-aware evaluation

inverse parsing

scientific diagram generation