Graph-defined Language Learning with LLMs

📅 2025-01-20

📈 Citations: 0

✨ Influential: 0

career value

160K/year

🤖 AI Summary

Large language models (LLMs) struggle to effectively model graph-structured data due to their inherent lack of graph-inductive bias and inability to natively capture structural dependencies. Method: This paper introduces “Graph Language,” a novel paradigm that encodes graph structures into learnable, language-like sequences via structure-aware tokenization and neighborhood-topology-driven graph-to-sequence mapping—enabling compact and expressive representation of high-order graph structures. LLMs are then pretrained and fine-tuned directly on these graph-language sequences, allowing them to comprehend graph topology end-to-end without relying on verbose textual descriptions or node/edge attribute embeddings. Contribution/Results: Evaluated on three real-world graph datasets, Graph Language consistently outperforms description-based and attribute-embedding baselines, achieving an average 4.2% improvement in node classification accuracy. It represents the first approach to enable LLMs to efficiently and end-to-end model graph structures up to arbitrary order.

Technology Category

Application Category

📝 Abstract

Recent efforts leverage Large Language Models (LLMs) for modeling text-attributed graph structures in node classification tasks. These approaches describe graph structures for LLMs to understand or aggregate LLM-generated textual attribute embeddings through graph structure. However, these approaches face two main limitations in modeling graph structures with LLMs. (i) Graph descriptions become verbose in describing high-order graph structure. (ii) Textual attributes alone do not contain adequate graph structure information. It is challenging to model graph structure concisely and adequately with LLMs. LLMs lack built-in mechanisms to model graph structures directly. They also struggle with complex long-range dependencies between high-order nodes and target nodes. Inspired by the observation that LLMs pre-trained on one language can achieve exceptional performance on another with minimal additional training, we propose extbf{G}raph- extbf{D}efined extbf{L}anguage for extbf{L}arge extbf{L}anguage extbf{M}odel (GDL4LLM). This novel framework enables LLMs to transfer their powerful language understanding capabilities to graph-structured data. GDL4LLM translates graphs into a graph language corpus instead of graph descriptions and pre-trains LLMs on this corpus to adequately understand graph structures. During fine-tuning, this corpus describes the structural information of target nodes concisely with only a few tokens. By treating graphs as a new language, GDL4LLM enables LLMs to model graph structures adequately and concisely for node classification tasks. Extensive experiments on three real-world datasets demonstrate that GDL4LLM outperforms description-based and textual attribute embeddings-based baselines by efficiently modeling different orders of graph structure with LLMs.

Problem

Research questions and friction points this paper is trying to address.

Large Language Models

Node Classification

Graph Structure

Innovation

Methods, ideas, or system contributions that make the work stand out.

GDL4LLM

Graph Language

Large Language Models

🔎 Similar Papers

No similar papers found.