๐ค AI Summary
Existing visual graph recognition methods are often confined to specific tasks and lack generalizability and cross-scenario transferability. This work proposes GraSP, an end-to-end framework based on subgraph prediction that jointly models graph structure and visual features to enable unified recognition of diverse graph types and rendering styles. GraSP achieves cross-task transfer without task-specific fine-tuning, representing the first general-purpose and transferable approach for visual graph recognition. Evaluated on multiple synthetic benchmarks and a real-world application, GraSP demonstrates exceptional generalization and adaptability, advancing the field toward a unified paradigm for graph recognition.
๐ Abstract
Despite tremendous improvements in tasks such as image classification, object detection, and segmentation, the recognition of visual relationships, commonly modeled as the extraction of a graph from an image, remains a challenging task. We believe that this mainly stems from the fact that there is no canonical way to approach the visual graph recognition task. Most existing solutions are specific to a problem and cannot be transferred between different contexts out-of-the box, even though the conceptual problem remains the same. With broad applicability and simplicity in mind, in this paper we develop a method, \textbf{Gra}ph Recognition via \textbf{S}ubgraph \textbf{P}rediction (\textbf{GraSP}), for recognizing graphs in images. We show across several synthetic benchmarks and one real-world application that our method works with a set of diverse types of graphs and their drawings, and can be transferred between tasks without task-specific modifications, paving the way to a more unified framework for visual graph recognition.