🤖 AI Summary
To address PyTorch Geometric’s limitations in supporting heterogeneous and temporal graphs, as well as its inefficiencies in feature storage and training scalability for large-scale graph learning, this work proposes a unified, extensible architecture. First, it introduces a native graph representation and message-passing mechanism designed for heterogeneous and dynamic graphs. Second, it develops a modular graph and feature storage system incorporating graph tiling, memory-mapped I/O, and distributed feature loading. Third, it provides unified interfaces for co-modeling graph neural networks, relational learning, and large language models. Experiments demonstrate that the framework enables efficient training on billion-edge graphs, achieving up to 4.2× higher throughput and reducing GPU memory consumption by up to 67% across multiple real-world industrial benchmarks. Moreover, it significantly enhances engineering flexibility, establishing foundational infrastructure for large-scale graph learning and multimodal graph–language joint modeling.
📝 Abstract
PyG (PyTorch Geometric) has evolved significantly since its initial release, establishing itself as a leading framework for Graph Neural Networks. In this paper, we present Pyg 2.0 (and its subsequent minor versions), a comprehensive update that introduces substantial improvements in scalability and real-world application capabilities. We detail the framework's enhanced architecture, including support for heterogeneous and temporal graphs, scalable feature/graph stores, and various optimizations, enabling researchers and practitioners to tackle large-scale graph learning problems efficiently. Over the recent years, PyG has been supporting graph learning in a large variety of application areas, which we will summarize, while providing a deep dive into the important areas of relational deep learning and large language modeling.