LEGO-Learn: Label-Efficient Graph Open-Set Learning

📅 2024-10-21

🏛️ arXiv.org

📈 Citations: 5

✨ Influential: 0

career value

187K/year

🤖 AI Summary

Addressing the dual challenges of low labeling cost and out-of-distribution (OOD) detection in open-set graph learning, this paper proposes a collaborative GNN framework. First, we design a GNN-based OOD filter jointly trained with a (C+1)-way classifier to simultaneously perform in-distribution (ID) classification and OOD detection. Second, we introduce a K-Medoids–based active sampling strategy to select highly informative ID nodes, thereby minimizing annotation effort. Third, we define a weighted cross-entropy loss that suppresses interference from OOD samples while strengthening supervision on critical ID samples. Extensive experiments on four real-world graph datasets demonstrate that our method achieves up to 6.62% improvement in ID classification accuracy and up to 7.49% gain in OOD detection AUROC, significantly outperforming existing low-budget open-graph learning approaches.

Technology Category

Application Category

📝 Abstract

How can we train graph-based models to recognize unseen classes while keeping labeling costs low? Graph open-set learning (GOL) and out-of-distribution (OOD) detection aim to address this challenge by training models that can accurately classify known, in-distribution (ID) classes while identifying and handling previously unseen classes during inference. It is critical for high-stakes, real-world applications where models frequently encounter unexpected data, including finance, security, and healthcare. However, current GOL methods assume access to many labeled ID samples, which is unrealistic for large-scale graphs due to high annotation costs. In this paper, we propose LEGO-Learn (Label-Efficient Graph Open-set Learning), a novel framework that tackles open-set node classification on graphs within a given label budget by selecting the most informative ID nodes. LEGO-Learn employs a GNN-based filter to identify and exclude potential OOD nodes and then select highly informative ID nodes for labeling using the K-Medoids algorithm. To prevent the filter from discarding valuable ID examples, we introduce a classifier that differentiates between the C known ID classes and an additional class representing OOD nodes (hence, a C+1 classifier). This classifier uses a weighted cross-entropy loss to balance the removal of OOD nodes while retaining informative ID nodes. Experimental results on four real-world datasets demonstrate that LEGO-Learn significantly outperforms leading methods, with up to a 6.62% improvement in ID classification accuracy and a 7.49% increase in AUROC for OOD detection.

Problem

Research questions and friction points this paper is trying to address.

Train graph models to recognize unseen classes with minimal labeling

Balance accurate ID classification and OOD detection under label constraints

Reduce annotation costs while maintaining performance in real-world applications

Innovation

Methods, ideas, or system contributions that make the work stand out.

GNN-based filter excludes potential OOD nodes

K-Medoids algorithm selects informative ID nodes

C+1 classifier balances OOD removal and ID retention

🔎 Similar Papers

LLM-Enhanced User-Item Interactions: Leveraging Edge Information for Optimized Recommendations