Generalizing Graph Transformers Across Diverse Graphs and Tasks via Pre-Training on Industrial-Scale Data

📅 2024-07-04

🏛️ IEEE Transactions on Knowledge and Data Engineering

📈 Citations: 8

✨ Influential: 0

career value

155K/year

🤖 AI Summary

To address weak inductive generalization and severe cross-graph/cross-task negative transfer in ultra-large-scale industrial graphs (tens of billions of nodes/edges), this paper proposes PGT, a general-purpose graph pretraining framework. Methodologically, PGT introduces a dual-task masked autoencoder-based pretraining scheme that jointly reconstructs node features and local structural patterns, and pioneers a decoder-driven feature enhancement strategy—enabling, for the first time, efficient inductive learning with Transformers on dynamic billion-scale graphs. Integrated with graph-structure-aware modeling and distributed sampling-based training, PGT achieves state-of-the-art performance on ogbn-papers100M (111 million nodes) and has been successfully deployed on Tencent’s gaming graph (540 million nodes, 12 billion edges). Empirical results demonstrate significant improvements across diverse static and dynamic downstream tasks.

Technology Category

Application Category

📝 Abstract

Graph pre-training has been concentrated on graph-level tasks involving small graphs (e.g., molecular graphs) or learning node representations on a fixed graph. Extending graph pre-trained models to web-scale graphs with billions of nodes in industrial scenarios, while avoiding negative transfer across graphs or tasks, remains a challenge. We aim to develop a general graph pre-trained model with inductive ability that can make predictions for unseen new nodes and even new graphs. In this work, we introduce a scalable transformer-based graph pre-training framework called PGT (Pre-trained Graph Transformer). Based on the masked autoencoder architecture, we design two pre-training tasks: one for reconstructing node features and the other for reconstructing local structures. Unlike the original autoencoder architecture where the pre-trained decoder is discarded, we propose a novel strategy that utilizes the decoder for feature augmentation. Our framework, tested on the publicly available ogbn-papers100M dataset with 111 million nodes and 1.6 billion edges, achieves state-of-the-art performance, showcasing scalability and efficiency. We have deployed our framework on Tencent's online game data, confirming its capability to pre-train on real-world graphs with over 540 million nodes and 12 billion edges and to generalize effectively across diverse static and dynamic downstream tasks.

Problem

Research questions and friction points this paper is trying to address.

Developing general graph pre-trained models for unseen nodes and graphs

Extending pre-training to web-scale graphs with billions of nodes

Avoiding negative transfer across diverse graphs and tasks

Innovation

Methods, ideas, or system contributions that make the work stand out.

Transformer-based graph pre-training framework

Masked autoencoder with feature and structure reconstruction

Decoder utilization for feature augmentation strategy

🔎 Similar Papers

Graph Transformers: A Survey