🤖 AI Summary
This work addresses the structural hallucination problem that large language models (LLMs) often exhibit when processing graph-structured data, a limitation exacerbated by existing approaches that either introduce redundant textual representations or suffer from misalignment between soft prompts and text tokens. To overcome this, the authors propose a topology-aware structural tokenizer that maps an entire graph’s topology into a single, interpretable special token 〈SOG_k〉, enabling explicit and compact encoding of graph structure. Furthermore, they align the graph and textual token spaces through a hybrid structural question-answering corpus. This approach uniquely embeds both global and local graph information into a single token, achieving performance gains of 9.9%–41.4% across five graph-level benchmarks—significantly outperforming baseline methods—while offering strong interpretability, consistency, and seamless extensibility to node-level tasks.
📝 Abstract
Large language models show great potential in unstructured data understanding, but still face significant challenges with graphs due to their structural hallucination. Existing approaches mainly either verbalize graphs into natural language, which leads to excessive token consumption and scattered attention, or transform graphs into trainable continuous embeddings (i.e., soft prompt), but exhibit severe misalignment with original text tokens. To solve this problem, we propose to incorporate one special tokento fully represent the Structure Of Graph within a unified token space, facilitating explicit topology input and structural information sharing. Specifically, we propose a topology-aware structural tokenizer that maps each graph topology into a highly selective single token. Afterwards, we construct a set of hybrid structure Question-Answering corpora to align new structural tokens with existing text tokens. With this approach,empowers LLMs to understand, generate, and reason in a concise and accurate manner. Extensive experiments on five graph-level benchmarks demonstrate the superiority of our method, achieving a performance improvement of 9.9% to 41.4% compared to the baselines while exhibiting interpretability and consistency. Furthermore, our method provides a flexible extension to node-level tasks, enabling both global and local structural understanding. The codebase is publicly available at https://github.com/Jingyao-Wu/SOG.