🤖 AI Summary
To address the long-tailed node classification problem in Text-Attributed Graphs (TAGs), this work pioneers the integration of large language models (LLaMA-2/Qwen) into graph imbalance learning. We propose a semantic interpolation-based prompting strategy to generate high-quality textual features for minority classes and design a text-driven link prediction mechanism to ensure semantic consistency and topological plausibility between generated text and the original graph structure. Our method jointly incorporates a BERT-based link predictor, text embedding alignment, and a GNN classifier (GCN or GraphSAGE). Evaluated on four real-world TAG benchmarks, it achieves an average 12.6% improvement in macro-F1 score and up to a 23.4% gain in recall for extremely rare classes—significantly outperforming state-of-the-art text augmentation and graph imbalance learning approaches.
📝 Abstract
Node classification on graphs often suffers from class imbalance, leading to biased predictions and significant risks in real-world applications. While data-centric solutions have been explored, they largely overlook Text-Attributed Graphs (TAGs) and the potential of using rich textual semantics to improve the classification of minority nodes. Given this gap, we propose Large Language Model-based Augmentation on Text-Attributed Graphs (LA-TAG), a novel framework that leverages Large Language Models (LLMs) to handle imbalanced node classification. Specifically, we develop prompting strategies inspired by interpolation to synthesize textual node attributes. Additionally, to effectively integrate synthetic nodes into the graph structure, we introduce a textual link predictor that connects the generated nodes to the original graph, preserving structural and contextual information. Experiments across various datasets and evaluation metrics demonstrate that LA-TAG outperforms existing textual augmentation and graph imbalance learning methods, emphasizing the efficacy of our approach in addressing class imbalance in TAGs.