: One LLM Token for Explicit Graph Structural Understanding

📅 2026-02-02
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the structural hallucination problem that large language models (LLMs) often exhibit when processing graph-structured data, a limitation exacerbated by existing approaches that either introduce redundant textual representations or suffer from misalignment between soft prompts and text tokens. To overcome this, the authors propose a topology-aware structural tokenizer that maps an entire graph’s topology into a single, interpretable special token 〈SOG_k〉, enabling explicit and compact encoding of graph structure. Furthermore, they align the graph and textual token spaces through a hybrid structural question-answering corpus. This approach uniquely embeds both global and local graph information into a single token, achieving performance gains of 9.9%–41.4% across five graph-level benchmarks—significantly outperforming baseline methods—while offering strong interpretability, consistency, and seamless extensibility to node-level tasks.

Technology Category

Application Category

📝 Abstract
Large language models show great potential in unstructured data understanding, but still face significant challenges with graphs due to their structural hallucination. Existing approaches mainly either verbalize graphs into natural language, which leads to excessive token consumption and scattered attention, or transform graphs into trainable continuous embeddings (i.e., soft prompt), but exhibit severe misalignment with original text tokens. To solve this problem, we propose to incorporate one special tokento fully represent the Structure Of Graph within a unified token space, facilitating explicit topology input and structural information sharing. Specifically, we propose a topology-aware structural tokenizer that maps each graph topology into a highly selective single token. Afterwards, we construct a set of hybrid structure Question-Answering corpora to align new structural tokens with existing text tokens. With this approach,empowers LLMs to understand, generate, and reason in a concise and accurate manner. Extensive experiments on five graph-level benchmarks demonstrate the superiority of our method, achieving a performance improvement of 9.9% to 41.4% compared to the baselines while exhibiting interpretability and consistency. Furthermore, our method provides a flexible extension to node-level tasks, enabling both global and local structural understanding. The codebase is publicly available at https://github.com/Jingyao-Wu/SOG.
Problem

Research questions and friction points this paper is trying to address.

graph understanding
structural hallucination
large language models
topology representation
graph reasoning
Innovation

Methods, ideas, or system contributions that make the work stand out.

<SOG_k>
structural tokenization
graph understanding
large language models
topology-aware representation
🔎 Similar Papers
No similar papers found.
Jingyao Wu
Jingyao Wu
MIT-Novo Nordisk AI Postdoctoral Fellow, MIT Media Lab
emotion recognitionaffective computingmachine learningspeech processingtime series analysis
Bin Lu
Bin Lu
Shanghai Jiao Tong University
graph neural networkspatiotemporal data miningAI for ScienceGeoAI
Z
Zijun Di
Shanghai Jiao Tong University
X
Xiaoying Gan
Shanghai Jiao Tong University
Meng Jin
Meng Jin
Shanghai Jiao Tong University
wireless communicationbackscatterRFID
L
Luoyi Fu
Shanghai Jiao Tong University
X
Xinbing Wang
Shanghai Jiao Tong University
C
Chenghu Zhou
IGSNRR, Chinese Academy of Sciences