SketchGraphNet: A Memory-Efficient Hybrid Graph Transformer for Large-Scale Sketch Corpora Recognition

📅 2026-03-08
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of structured modeling in large-scale freehand sketch recognition by proposing a graph-native approach that directly represents sketches as spatiotemporal graphs, circumventing reliance on image rasterization or stroke sequence conversion. The authors introduce a hybrid graph neural network architecture that integrates local message passing with a memory-efficient global attention mechanism (MemEffAttn), enabling accurate and efficient recognition without auxiliary positional or structural encodings. Evaluated on SketchGraph—a newly constructed benchmark comprising 3.44 million graph-structured sketches across 344 categories—the model achieves Top-1 accuracies of 83.62% and 87.61% on SketchGraph-A and SketchGraph-R, respectively. Moreover, it reduces peak GPU memory consumption by over 40% and training time by more than 30% compared to models using Performer-based attention.

Technology Category

Application Category

📝 Abstract
This work investigates large-scale sketch recognition from a graph-native perspective, where free-hand sketches are directly modeled as structured graphs rather than raster images or stroke sequences. We propose SketchGraphNet, a hybrid graph neural architecture that integrates local message passing with a memory-efficient global attention mechanism, without relying on auxiliary positional or structural encodings. To support systematic evaluation, we construct SketchGraph, a large-scale benchmark comprising 3.44 million graph-structured sketches across 344 categories, with two variants (A and R) to reflect different noise conditions. Each sketch is represented as a spatiotemporal graph with normalized stroke-order attributes. On SketchGraph-A and SketchGraph-R, SketchGraphNet achieves Top-1 accuracies of 83.62% and 87.61%, respectively, under a unified training configuration. MemEffAttn further reduces peak GPU memory by over 40% and training time by more than 30% compared with Performer-based global attention, while maintaining comparable accuracy.
Problem

Research questions and friction points this paper is trying to address.

large-scale sketch recognition
graph-native representation
memory-efficient attention
sketch corpora
structured graph modeling
Innovation

Methods, ideas, or system contributions that make the work stand out.

graph-native representation
memory-efficient attention
hybrid graph transformer
large-scale sketch recognition
spatiotemporal graph
🔎 Similar Papers
No similar papers found.
S
Shilong Chen
College of Computer, Qinghai Normal University, Xining, 810001, Qinghai, China
M
Mingyuan Li
College of Computer, Nanjing University, Nanjing, 210023, Jiangsu, China
Zhaoyang Wang
Zhaoyang Wang
University of North Carolina at Chapel Hill
NLPLLM AlignmentLLM Reasoning
Z
Zhonglin Ye
College of Computer, Qinghai Normal University, Xining, 810001, Qinghai, China
H
Haixing Zhao
College of Computer, Qinghai Minzu University, Xining, 810007, Qinghai, China