GiGL: Large-Scale Graph Neural Networks at Snapchat

📅 2025-02-20

📈 Citations: 0

✨ Influential: 0

career value

198K/year

🤖 AI Summary

To address scalability bottlenecks in industrial-scale, billion-node graph neural network (GNN) deployment—particularly for training, inference, and serving on social graphs at Snapchat’s scale—this paper introduces GiGL, the first open-source framework enabling end-to-end productionization of GNNs. GiGL integrates distributed graph computation, relational-database-driven subgraph sampling preprocessing, and PyTorch Geometric–based modeling into a unified, Kubernetes-native pipeline. Its key innovation lies in jointly optimizing system performance and algorithmic flexibility within a single architecture. Deployed at Snapchat, GiGL powers over 35 production models, processing more than one billion nodes daily. It delivers measurable improvements in critical metrics—including AUC and CTR—across core applications such as friend recommendation, content distribution, and ad targeting.

Technology Category

Application Category

📝 Abstract

Recent advances in graph machine learning (ML) with the introduction of Graph Neural Networks (GNNs) have led to a widespread interest in applying these approaches to business applications at scale. GNNs enable differentiable end-to-end (E2E) learning of model parameters given graph structure which enables optimization towards popular node, edge (link) and graph-level tasks. While the research innovation in new GNN layers and training strategies has been rapid, industrial adoption and utility of GNNs has lagged considerably due to the unique scale challenges that large-scale graph ML problems create. In this work, we share our approach to training, inference, and utilization of GNNs at Snapchat. To this end, we present GiGL (Gigantic Graph Learning), an open-source library to enable large-scale distributed graph ML to the benefit of researchers, ML engineers, and practitioners. We use GiGL internally at Snapchat to manage the heavy lifting of GNN workflows, including graph data preprocessing from relational DBs, subgraph sampling, distributed training, inference, and orchestration. GiGL is designed to interface cleanly with open-source GNN modeling libraries prominent in academia like PyTorch Geometric (PyG), while handling scaling and productionization challenges that make it easier for internal practitioners to focus on modeling. GiGL is used in multiple production settings, and has powered over 35 launches across multiple business domains in the last 2 years in the contexts of friend recommendation, content recommendation and advertising. This work details high-level design and tools the library provides, scaling properties, case studies in diverse business settings with industry-scale graphs, and several key lessons learned in employing graph ML at scale on large social data. GiGL is open-sourced at https://github.com/snap-research/GiGL.

Problem

Research questions and friction points this paper is trying to address.

Scalable Graph Neural Networks for industry applications

Enabling large-scale distributed graph machine learning

Addressing scaling and productionization challenges in GNNs

Innovation

Methods, ideas, or system contributions that make the work stand out.

Large-scale distributed graph ML

GNN workflows management

Integration with PyTorch Geometric

🔎 Similar Papers

LLM-Enhanced User-Item Interactions: Leveraging Edge Information for Optimized Recommendations