G2rammar: Bilingual Grammar Modeling for Enhanced Text-attributed Graph Learning

📅 2025-11-02
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing text-attributed graph learning methods linearize graph structures into sequences without capturing syntactic cues that characterize nodes’ structural roles (e.g., hubs, bridges, peripheries), hindering large language models from effectively modeling graph topology. To address this, we propose G2rammar—a novel framework introducing a dual-track grammar system: *structural grammar*, which explicitly encodes nodes’ functional roles within the graph, and *semantic grammar*, which integrates textual node attributes. G2rammar employs random-walk-based linearization, centrality analysis, and neighborhood pattern recognition to realize a two-stage learning paradigm: structural grammar pretraining followed by semantic grammar fine-tuning. Extensive experiments on multiple real-world benchmarks demonstrate that G2rammar significantly outperforms state-of-the-art baselines, validating the efficacy of syntax-aware modeling for enhancing graph representation learning.

Technology Category

Application Category

📝 Abstract
Text-attributed graphs require models to effectively integrate both structural topology and semantic content. Recent approaches apply large language models to graphs by linearizing structures into token sequences through random walks. These methods create concise graph vocabularies to replace verbose natural language descriptions. However, they overlook a critical component that makes language expressive: grammar. In natural language, grammar assigns syntactic roles to words and defines their functions within sentences. Similarly, nodes in graphs play distinct structural roles as hubs, bridges, or peripheral members. Current graph language methods provide tokens without grammatical annotations to indicate these structural or semantic roles. This absence limits language models' ability to reason about graph topology effectively. We propose extbf{G2rammar}, a bilingual grammar framework that explicitly encodes both structural and semantic grammar for text-attributed graphs. Structural grammar characterizes topological roles through centrality and neighborhood patterns. Semantic grammar captures content relationships through textual informativity. The framework implements two-stage learning with structural grammar pre-training followed by semantic grammar fine-tuning. Extensive experiments on real-world datasets demonstrate that G2rammar consistently outperforms competitive baselines by providing language models with the grammatical context needed to understand graph structures.
Problem

Research questions and friction points this paper is trying to address.

Modeling structural and semantic grammar for text-attributed graphs
Enhancing language models' reasoning about graph topology
Encoding topological roles and content relationships explicitly
Innovation

Methods, ideas, or system contributions that make the work stand out.

Bilingual grammar framework encodes structural and semantic grammar
Structural grammar characterizes topological roles via centrality patterns
Two-stage learning with pre-training and fine-tuning implementation
🔎 Similar Papers
No similar papers found.
H
Heng Zheng
South China Normal University
Haochen You
Haochen You
Columbia University
Generative AIMachine LearningStatistics
Zijun Liu
Zijun Liu
Tsinghua University
LLMAgentMachine TranslationAIGC
Z
Zijian Zhang
University of Pennsylvania
L
Lubin Gan
University of Science and Technology of China
H
Hao Zhang
University of Chinese Academy of Sciences
W
Wenjun Huang
Sun Yat-sen University
J
Jin Huang
South China Normal University