Can Graph Neural Networks Learn Language with Extremely Weak Text Supervision?

📅 2024-12-11

🏛️ arXiv.org

📈 Citations: 8

✨ Influential: 0

career value

151K/year

🤖 AI Summary

Graph neural networks (GNNs) face severe generalization bottlenecks under label scarcity, cross-domain transfer, and semantic gaps between graph-structured and textual modalities. Method: This paper proposes a language-aware multimodal prompt learning framework that jointly optimizes structural graph prompts and textual prompts to align graph embeddings into the semantic space of large language models (LLMs), thereby establishing the first CLIP-style zero-shot GNN classification prototype. It employs contrastive learning and cross-modal alignment to bridge the semantic gap between graphs and text. Contribution/Results: The framework enables zero-shot, cross-domain, multi-task, and few-shot generalization to unseen classes. Empirical results demonstrate that it significantly outperforms existing baselines on zero-shot graph classification using only a handful of samples—each annotated with short textual labels—thereby introducing a novel paradigm for weakly supervised graph representation learning.

Technology Category

Application Category

📝 Abstract

While great success has been achieved in building vision models with Contrastive Language-Image Pre-training (CLIP) over internet-scale image-text pairs, building transferable Graph Neural Networks (GNNs) with CLIP pipeline is challenging because of the scarcity of labeled data and text supervision, different levels of downstream tasks, and the conceptual gaps between domains. In this work, to address these issues, we propose a multi-modal prompt learning paradigm to effectively adapt pre-trained GNN to downstream tasks and data, given only a few semantically labeled samples, each with extremely weak text supervision. Our new paradigm embeds the graphs directly in the same space as the Large Language Models (LLMs) by learning both graph prompts and text prompts simultaneously. We demonstrate the superior performance of our paradigm in few-shot, multi-task-level, and cross-domain settings. Moreover, we build the first CLIP-style zero-shot classification prototype that can generalize GNNs to unseen classes with extremely weak text supervision. The code is available at https://github.com/Violet24K/Morpher.

Problem

Research questions and friction points this paper is trying to address.

Adapting pre-trained GNNs to downstream tasks with minimal labeled data

Bridging domain gaps between graphs and LLMs via multimodal prompts

Enabling zero-shot GNN classification with weak text supervision

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-modal prompt learning for GNN adaptation

Embed graphs in LLM space via prompts

CLIP-style zero-shot classification prototype

🔎 Similar Papers

LLM-Enhanced User-Item Interactions: Leveraging Edge Information for Optimized Recommendations