🤖 AI Summary
This work addresses the challenge of generating high-quality scientific figures from incomplete hand-drawn sketches, where existing methods struggle to jointly preserve semantic content and topological structure. The authors propose a lightweight retrieval-augmented framework that, for the first time, integrates knowledge graphs with multi-granularity sketch variants to establish a structure-aware retrieval mechanism. By representing chart semantics via a knowledge graph, synthesizing multi-level simplified sketch variants, and training a shared embedding model, the approach achieves joint structural-semantic alignment between sketches and reference figures within a unified embedding space, further guided by visual priors during generation. Evaluated on DiagramBank and FigureBench, the method achieves F1 scores of 0.848 and 0.802, respectively, a VLM score of 7.170, and reduces single-sample inference latency to 35.48 seconds.
📝 Abstract
Scientific diagrams are essential for communicating complex methodologies in academic papers. A natural way for researchers to specify such diagrams is through rough sketches, where text labels, connectors, and spatial arrangements express early semantic and topological intentions. However, sketches are usually incomplete, making them insufficient for directly producing publication-quality diagrams. Existing sketch-based generation methods mainly reconstruct the sketch itself, while recent text-driven diagram generation frameworks rely on textual semantics and do not fully exploit the topological structure contained in sketches. In this paper, we introduce DiagramRAG, a lightweight retrieval-augmented framework for sketch-based scientific diagram completion. Given a user sketch, DiagramRAG retrieves reference diagrams that are both semantically relevant to the sketch content and topologically compatible with its structure, and uses them to guide downstream diagram generation. To enable efficient structure-aware retrieval, we represent diagrams as knowledge graphs, synthesize sketch variants at different simplification levels, and train an embedding model to align sketches with compatible diagrams in a shared space. The retrieved references further provide content, topology, and visual priors for completing and rendering the final diagram. Experiments show that DiagramRAG achieves F1-scores of 0.848 and 0.802 on DiagramBank and FigureBench, respectively, and improves generation quality with the best VLM-as-a-Judge score of 7.170, while reducing inference latency to 35.48 seconds per sample. Our code and data are available at https://anonymous.4open.science/r/DiagramRAG-A262 and https://huggingface.co/datasets/anonymous-review-a262/DiagramSketch.