🤖 AI Summary
This work investigates the efficacy of large language models (LLMs) in mitigating catastrophic forgetting in graph continual learning (GCL). Addressing critical limitations of existing evaluation paradigms—namely, task-ID leakage and misalignment with real-world deployment constraints—we propose SimGCL, a lightweight replay-free method that jointly leverages task-aware prompt engineering and graph-topology-aware embedding alignment, achieving ~20% performance gain without accessing historical data. Furthermore, we introduce LLM4GCL, the first dedicated LLM benchmark for GCL, featuring standardized training/evaluation pipelines, diverse multi-source graph datasets, and reproducible evaluation protocols. Extensive experiments demonstrate that SimGCL significantly outperforms state-of-the-art GNN-based baselines across mainstream GCL benchmarks. The LLM4GCL benchmark is publicly released to foster rigorous, reproducible assessment and advance LLM-augmented GCL methodologies.
📝 Abstract
Nowadays, real-world data, including graph-structure data, often arrives in a streaming manner, which means that learning systems need to continuously acquire new knowledge without forgetting previously learned information. Although substantial existing works attempt to address catastrophic forgetting in graph machine learning, they are all based on training from scratch with streaming data. With the rise of pretrained models, an increasing number of studies have leveraged their strong generalization ability for continual learning. Therefore, in this work, we attempt to answer whether large language models (LLMs) can mitigate catastrophic forgetting in Graph Continual Learning (GCL). We first point out that current experimental setups for GCL have significant flaws, as the evaluation stage may lead to task ID leakage. Then, we evaluate the performance of LLMs in more realistic scenarios and find that even minor modifications can lead to outstanding results. Finally, based on extensive experiments, we propose a simple-yet-effective method, Simple Graph Continual Learning (SimGCL), that surpasses the previous state-of-the-art GNN-based baseline by around 20% under the rehearsal-free constraint. To facilitate reproducibility, we have developed an easy-to-use benchmark LLM4GCL for training and evaluating existing GCL methods. The code is available at: https://github.com/ZhixunLEE/LLM4GCL.