LLM as GNN: Graph Vocabulary Learning for Text-Attributed Graph Foundation Models

📅 2025-03-05
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing methods for text-attributed graphs (TAGs) suffer from semantic fragmentation, vocabulary explosion, and incompatibility with prompt templates—stemming from two-stage alignment architectures and out-of-vocabulary (OOV) node tokens. To address these issues, this paper proposes PromptGFM, a graph foundation model grounded in large language models (LLMs). Its core innovation is the novel “LLM-as-GNN” paradigm: a graph understanding module emulates GNN-style message passing directly in textual space, coupled with a linguistically grounded graph vocabulary to yield semantically coherent, interpretable, and transferable graph encodings. PromptGFM integrates prompt engineering, structure-text mapping learning, and task-oriented instruction tuning. Evaluated on diverse multi-source TAG benchmarks, it achieves significant gains in node classification and link prediction, demonstrates strong cross-graph generalization, and maintains robust compatibility with heterogeneous prompt templates. The code is publicly available.

Technology Category

Application Category

📝 Abstract
Text-Attributed Graphs (TAGs), where each node is associated with text descriptions, are ubiquitous in real-world scenarios. They typically exhibit distinctive structure and domain-specific knowledge, motivating the development of a Graph Foundation Model (GFM) that generalizes across diverse graphs and tasks. Despite large efforts to integrate Large Language Models (LLMs) and Graph Neural Networks (GNNs) for TAGs, existing approaches suffer from decoupled architectures with two-stage alignment, limiting their synergistic potential. Even worse, existing methods assign out-of-vocabulary (OOV) tokens to graph nodes, leading to graph-specific semantics, token explosion, and incompatibility with task-oriented prompt templates, which hinders cross-graph and cross-task transferability. To address these challenges, we propose PromptGFM, a versatile GFM for TAGs grounded in graph vocabulary learning. PromptGFM comprises two key components: (1) Graph Understanding Module, which explicitly prompts LLMs to replicate the finest GNN workflow within the text space, facilitating seamless GNN-LLM integration and elegant graph-text alignment; (2) Graph Inference Module, which establishes a language-based graph vocabulary ensuring expressiveness, transferability, and scalability, enabling readable instructions for LLM fine-tuning. Extensive experiments demonstrate our superiority and transferability across diverse graphs and tasks. The code is available at this: https://github.com/agiresearch/PromptGFM.
Problem

Research questions and friction points this paper is trying to address.

Integrate LLMs and GNNs for Text-Attributed Graphs effectively.
Address out-of-vocabulary token issues in graph node representation.
Enhance cross-graph and cross-task transferability in graph models.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Graph Understanding Module integrates GNN-LLM workflows
Graph Inference Module creates language-based graph vocabulary
PromptGFM enhances cross-graph and cross-task transferability
🔎 Similar Papers
2024-10-09Conference on Empirical Methods in Natural Language ProcessingCitations: 0
X
Xi Zhu
Rutgers University
H
Haochen Xue
University of Liverpool
Z
Ziwei Zhao
University of Science and Technology of China
Wujiang Xu
Wujiang Xu
Rutgers, Meta, Ant Group
Agentic AILLM Agents
Jingyuan Huang
Jingyuan Huang
Rutgers Univeristy
LLM AgentsRecommender SystemsGraph Mining
M
Minghao Guo
Rutgers University
Q
Qifan Wang
Meta AI
Kaixiong Zhou
Kaixiong Zhou
Assistant Professor, North Carolina State University
Machine LearningAI4ScienceGraph Data Mining
Y
Yongfeng Zhang
Rutgers University