Large Language Model-based Augmentation for Imbalanced Node Classification on Text-Attributed Graphs

📅 2024-10-22

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

To address the long-tailed node classification problem in Text-Attributed Graphs (TAGs), this work pioneers the integration of large language models (LLaMA-2/Qwen) into graph imbalance learning. We propose a semantic interpolation-based prompting strategy to generate high-quality textual features for minority classes and design a text-driven link prediction mechanism to ensure semantic consistency and topological plausibility between generated text and the original graph structure. Our method jointly incorporates a BERT-based link predictor, text embedding alignment, and a GNN classifier (GCN or GraphSAGE). Evaluated on four real-world TAG benchmarks, it achieves an average 12.6% improvement in macro-F1 score and up to a 23.4% gain in recall for extremely rare classes—significantly outperforming state-of-the-art text augmentation and graph imbalance learning approaches.

Technology Category

Application Category

📝 Abstract

Node classification on graphs often suffers from class imbalance, leading to biased predictions and significant risks in real-world applications. While data-centric solutions have been explored, they largely overlook Text-Attributed Graphs (TAGs) and the potential of using rich textual semantics to improve the classification of minority nodes. Given this gap, we propose Large Language Model-based Augmentation on Text-Attributed Graphs (LA-TAG), a novel framework that leverages Large Language Models (LLMs) to handle imbalanced node classification. Specifically, we develop prompting strategies inspired by interpolation to synthesize textual node attributes. Additionally, to effectively integrate synthetic nodes into the graph structure, we introduce a textual link predictor that connects the generated nodes to the original graph, preserving structural and contextual information. Experiments across various datasets and evaluation metrics demonstrate that LA-TAG outperforms existing textual augmentation and graph imbalance learning methods, emphasizing the efficacy of our approach in addressing class imbalance in TAGs.

Problem

Research questions and friction points this paper is trying to address.

Imbalanced Classes

Node Classification

Text Attribute Graphs

Innovation

Methods, ideas, or system contributions that make the work stand out.

Large Language Model

Text Attribute Graph Augmentation

Imbalanced Node Classification

🔎 Similar Papers

No similar papers found.

Authors to Follow