Scalable and Performant Data Loading

📅 2025-04-23
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Data loading frequently bottlenecks GPU computation during AI training. To address this, we propose SPDL—a framework-agnostic, open-source GPU data loading library that systematically leverages GIL release mechanisms to enable fully concurrent preprocessing, supporting Free-Threaded Python while maintaining backward compatibility with existing code. Its core innovations include a GIL-aware multithreaded pool, zero-copy GPU memory transfers, asynchronous I/O scheduling, and CPU–GPU pipeline co-optimization. On ImageNet, SPDL achieves a 74% iteration speedup, reduces CPU utilization by 38%, and saves 50 GB of memory. It completely eliminates data starvation in ViT-B/16 training. Under Python 3.13, throughput further improves by 33%. SPDL establishes a scalable, high-performance, and cross-version-compatible paradigm for AI data loading.

Technology Category

Application Category

📝 Abstract
We present SPDL (Scalable and Performant Data Loading), an open-source, framework-agnostic library designed for efficiently loading array data to GPU. Data loading is often a bottleneck in AI applications, and is challenging to optimize because it requires coordination of network calls, CPU-bound tasks, and GPU device transfer. On top of that, Python's GIL (Global Interpreter Lock) makes it difficult to gain performance improvement from multi-threading. We found that when data preprocessing functions release the GIL entirely, it is possible to execute them concurrently in a thread pool, thereby improving the workflow performance. Our benchmark shows that compared to the PyTorch DataLoader, SPDL can iterate through the ImageNet dataset 74% faster while using 38% less CPU and 50GB less memory. When training ViT-B/16 model, SPDL can send data to the GPU at a speed that does not starve the training. Additionally, when using SPDL on Python 3.13t, without changing any code, the throughput is further by improved by 33%, thanks to the disabled GIL. SPDL can improve the performance of current AI model training, and receives further performance improvements when Free-Threaded Python is adopted in production systems. SPDL is available at https://github.com/facebookresearch/spdl.
Problem

Research questions and friction points this paper is trying to address.

Efficiently loading array data to GPU for AI applications
Overcoming Python's GIL limitations in multi-threaded data preprocessing
Reducing CPU and memory usage while accelerating data loading
Innovation

Methods, ideas, or system contributions that make the work stand out.

Framework-agnostic library for GPU array loading
Concurrent execution via GIL-free thread pool
Optimized network, CPU, and GPU coordination
🔎 Similar Papers
No similar papers found.
Moto Hira
Moto Hira
Meta Platforms, Inc
C
Christian Puhrsch
Meta Platforms, Inc, Menlo Park, California, USA
V
Valentin Andrei
Meta Platforms, Inc, Menlo Park, California, USA
R
Roman Malinovskyy
Meta Platforms, Inc, Menlo Park, California, USA
Gaël Le Lan
Gaël Le Lan
Meta Reality Labs
A
Abhinandan Krishnan
Meta Platforms, Inc, Menlo Park, California, USA
J
Joseph Cummings
Meta Platforms, Inc, Menlo Park, California, USA
Miguel Martin
Miguel Martin
Meta Platforms, Inc, Menlo Park, California, USA
G
Gokul Gunasekaran
Meta Platforms, Inc, Menlo Park, California, USA
Y
Yuta Inoue
Meta Platforms, Inc, Menlo Park, California, USA
A
Alex J Turner
Meta Platforms, Inc, Menlo Park, California, USA
R
Raghuraman Krishnamoorthi
Meta Platforms, Inc, Menlo Park, California, USA