G2rammar: Bilingual Grammar Modeling for Enhanced Text-attributed Graph Learning

📅 2025-11-02

📈 Citations: 0

✨ Influential: 0

career value

180K/year

🤖 AI Summary

Existing text-attributed graph learning methods linearize graph structures into sequences without capturing syntactic cues that characterize nodes’ structural roles (e.g., hubs, bridges, peripheries), hindering large language models from effectively modeling graph topology. To address this, we propose G2rammar—a novel framework introducing a dual-track grammar system: *structural grammar*, which explicitly encodes nodes’ functional roles within the graph, and *semantic grammar*, which integrates textual node attributes. G2rammar employs random-walk-based linearization, centrality analysis, and neighborhood pattern recognition to realize a two-stage learning paradigm: structural grammar pretraining followed by semantic grammar fine-tuning. Extensive experiments on multiple real-world benchmarks demonstrate that G2rammar significantly outperforms state-of-the-art baselines, validating the efficacy of syntax-aware modeling for enhancing graph representation learning.

Technology Category

Application Category

📝 Abstract

Text-attributed graphs require models to effectively integrate both structural topology and semantic content. Recent approaches apply large language models to graphs by linearizing structures into token sequences through random walks. These methods create concise graph vocabularies to replace verbose natural language descriptions. However, they overlook a critical component that makes language expressive: grammar. In natural language, grammar assigns syntactic roles to words and defines their functions within sentences. Similarly, nodes in graphs play distinct structural roles as hubs, bridges, or peripheral members. Current graph language methods provide tokens without grammatical annotations to indicate these structural or semantic roles. This absence limits language models' ability to reason about graph topology effectively. We propose extbf{G2rammar}, a bilingual grammar framework that explicitly encodes both structural and semantic grammar for text-attributed graphs. Structural grammar characterizes topological roles through centrality and neighborhood patterns. Semantic grammar captures content relationships through textual informativity. The framework implements two-stage learning with structural grammar pre-training followed by semantic grammar fine-tuning. Extensive experiments on real-world datasets demonstrate that G2rammar consistently outperforms competitive baselines by providing language models with the grammatical context needed to understand graph structures.

Problem

Research questions and friction points this paper is trying to address.

Modeling structural and semantic grammar for text-attributed graphs

Enhancing language models' reasoning about graph topology

Encoding topological roles and content relationships explicitly

Innovation

Methods, ideas, or system contributions that make the work stand out.

Bilingual grammar framework encodes structural and semantic grammar

Structural grammar characterizes topological roles via centrality patterns

Two-stage learning with pre-training and fine-tuning implementation

🔎 Similar Papers

No similar papers found.