🤖 AI Summary
This work addresses the challenge of node-level out-of-distribution (OOD) detection in textual attributed graphs by proposing a novel approach that integrates large language models (LLMs) with energy-based contrastive learning. The method leverages LLMs to generate high-quality pseudo-OOD samples that capture both semantic and structural characteristics, and employs an energy-based contrastive learning framework to effectively model the decision boundary between in-distribution and out-of-distribution nodes. To the best of our knowledge, this is the first study to incorporate LLMs into graph-based OOD detection, substantially enhancing model robustness against distributional shifts. Extensive experiments on six benchmark datasets demonstrate that the proposed method achieves superior OOD detection performance while maintaining high accuracy in node classification compared to existing approaches.
📝 Abstract
Text-attributed graphs, where nodes are enriched with textual attributes, have become a powerful tool for modeling real-world networks such as citation, social, and transaction networks. However, existing methods for learning from these graphs often assume that the distributions of training and testing data are consistent. This assumption leads to significant performance degradation when faced with out-of-distribution (OOD) data. In this paper, we address the challenge of node-level OOD detection in text-attributed graphs, with the goal of maintaining accurate node classification while simultaneously identifying OOD nodes. We propose a novel approach, LLM-Enhanced Energy Contrastive Learning for Out-of-Distribution Detection in Text-Attributed Graphs (LECT), which integrates large language models (LLMs) and energy-based contrastive learning. The proposed method involves generating high-quality OOD samples by leveraging the semantic understanding and contextual knowledge of LLMs to create dependency-aware pseudo-OOD nodes, and applying contrastive learning based on energy functions to distinguish between in-distribution (IND) and OOD nodes. The effectiveness of our method is demonstrated through extensive experiments on six benchmark datasets, where our method consistently outperforms state-of-the-art baselines, achieving both high classification accuracy and robust OOD detection capabilities.