PyG 2.0: Scalable Learning on Real World Graphs

📅 2025-07-22

📈 Citations: 0

✨ Influential: 0

career value

195K/year

🤖 AI Summary

To address PyTorch Geometric’s limitations in supporting heterogeneous and temporal graphs, as well as its inefficiencies in feature storage and training scalability for large-scale graph learning, this work proposes a unified, extensible architecture. First, it introduces a native graph representation and message-passing mechanism designed for heterogeneous and dynamic graphs. Second, it develops a modular graph and feature storage system incorporating graph tiling, memory-mapped I/O, and distributed feature loading. Third, it provides unified interfaces for co-modeling graph neural networks, relational learning, and large language models. Experiments demonstrate that the framework enables efficient training on billion-edge graphs, achieving up to 4.2× higher throughput and reducing GPU memory consumption by up to 67% across multiple real-world industrial benchmarks. Moreover, it significantly enhances engineering flexibility, establishing foundational infrastructure for large-scale graph learning and multimodal graph–language joint modeling.

Technology Category

Application Category

📝 Abstract

PyG (PyTorch Geometric) has evolved significantly since its initial release, establishing itself as a leading framework for Graph Neural Networks. In this paper, we present Pyg 2.0 (and its subsequent minor versions), a comprehensive update that introduces substantial improvements in scalability and real-world application capabilities. We detail the framework's enhanced architecture, including support for heterogeneous and temporal graphs, scalable feature/graph stores, and various optimizations, enabling researchers and practitioners to tackle large-scale graph learning problems efficiently. Over the recent years, PyG has been supporting graph learning in a large variety of application areas, which we will summarize, while providing a deep dive into the important areas of relational deep learning and large language modeling.

Problem

Research questions and friction points this paper is trying to address.

Enhancing scalability for large-scale graph learning

Supporting heterogeneous and temporal graph structures

Optimizing real-world applications in graph learning

Innovation

Methods, ideas, or system contributions that make the work stand out.

Enhanced architecture for heterogeneous temporal graphs

Scalable feature and graph stores optimization

Support for relational deep learning applications

🔎 Similar Papers

When Heterophily Meets Heterogeneity: New Graph Benchmarks and Effective Methods