Does GCL Need a Large Number of Negative Samples? Enhancing Graph Contrastive Learning with Effective and Efficient Negative Sampling

📅 2025-03-23

📈 Citations: 0

✨ Influential: 0

career value

169K/year

🤖 AI Summary

This work challenges the prevailing assumption in graph contrastive learning (GCL) that “more negative samples are always better,” revealing that excessive negatives degrade node semantic discriminability. To address this, we propose E2Neg—a topology-decoupled negative sampling method that selects only a small set of high-quality, low-topological-coupling representative negatives, thereby significantly enhancing semantic separability. Theoretical analysis demonstrates that E2Neg mitigates gradient interference and semantic confusion inherent in the InfoNCE loss. Extensive experiments on multiple benchmark datasets show that E2Neg achieves state-of-the-art performance, accelerates training by 2.1–3.8×, and reduces GPU memory consumption by 67%. These results validate the effectiveness and efficiency of sparse, semantics-oriented negative sampling in GCL.

Technology Category

Application Category

📝 Abstract

Graph Contrastive Learning (GCL) aims to self-supervised learn low-dimensional graph representations, primarily through instance discrimination, which involves manually mining positive and negative pairs from graphs, increasing the similarity of positive pairs while decreasing negative pairs. Drawing from the success of Contrastive Learning (CL) in other domains, a consensus has been reached that the effectiveness of GCLs depends on a large number of negative pairs. As a result, despite the significant computational overhead, GCLs typically leverage as many negative node pairs as possible to improve model performance. However, given that nodes within a graph are interconnected, we argue that nodes cannot be treated as independent instances. Therefore, we challenge this consensus: Does employing more negative nodes lead to a more effective GCL model? To answer this, we explore the role of negative nodes in the commonly used InfoNCE loss for GCL and observe that: (1) Counterintuitively, a large number of negative nodes can actually hinder the model's ability to distinguish nodes with different semantics. (2) A smaller number of high-quality and non-topologically coupled negative nodes are sufficient to enhance the discriminability of representations. Based on these findings, we propose a new method called GCL with Effective and Efficient Negative samples, E2Neg, which learns discriminative representations using only a very small set of representative negative samples. E2Neg significantly reduces computational overhead and speeds up model training. We demonstrate the effectiveness and efficiency of E2Neg across multiple datasets compared to other GCL methods.

Problem

Research questions and friction points this paper is trying to address.

Challenges need for many negative samples in GCL

Explores impact of negative nodes on model performance

Proposes efficient negative sampling method (E2Neg)

Innovation

Methods, ideas, or system contributions that make the work stand out.

Challenges need for many negative samples

Proposes high-quality non-coupled negative nodes

Introduces E2Neg for efficient contrastive learning

🔎 Similar Papers

Leveraging Contrastive Learning for Enhanced Node Representations in Tokenized Graph Transformers