🤖 AI Summary
Existing text-attributed graph learning methods linearize graph structures into sequences without capturing syntactic cues that characterize nodes’ structural roles (e.g., hubs, bridges, peripheries), hindering large language models from effectively modeling graph topology. To address this, we propose G2rammar—a novel framework introducing a dual-track grammar system: *structural grammar*, which explicitly encodes nodes’ functional roles within the graph, and *semantic grammar*, which integrates textual node attributes. G2rammar employs random-walk-based linearization, centrality analysis, and neighborhood pattern recognition to realize a two-stage learning paradigm: structural grammar pretraining followed by semantic grammar fine-tuning. Extensive experiments on multiple real-world benchmarks demonstrate that G2rammar significantly outperforms state-of-the-art baselines, validating the efficacy of syntax-aware modeling for enhancing graph representation learning.
📝 Abstract
Text-attributed graphs require models to effectively integrate both structural topology and semantic content. Recent approaches apply large language models to graphs by linearizing structures into token sequences through random walks. These methods create concise graph vocabularies to replace verbose natural language descriptions. However, they overlook a critical component that makes language expressive: grammar. In natural language, grammar assigns syntactic roles to words and defines their functions within sentences. Similarly, nodes in graphs play distinct structural roles as hubs, bridges, or peripheral members. Current graph language methods provide tokens without grammatical annotations to indicate these structural or semantic roles. This absence limits language models' ability to reason about graph topology effectively. We propose extbf{G2rammar}, a bilingual grammar framework that explicitly encodes both structural and semantic grammar for text-attributed graphs. Structural grammar characterizes topological roles through centrality and neighborhood patterns. Semantic grammar captures content relationships through textual informativity. The framework implements two-stage learning with structural grammar pre-training followed by semantic grammar fine-tuning. Extensive experiments on real-world datasets demonstrate that G2rammar consistently outperforms competitive baselines by providing language models with the grammatical context needed to understand graph structures.