Scalable and Performant Data Loading

📅 2025-04-23

📈 Citations: 0

✨ Influential: 0

career value

201K/year

🤖 AI Summary

Data loading frequently bottlenecks GPU computation during AI training. To address this, we propose SPDL—a framework-agnostic, open-source GPU data loading library that systematically leverages GIL release mechanisms to enable fully concurrent preprocessing, supporting Free-Threaded Python while maintaining backward compatibility with existing code. Its core innovations include a GIL-aware multithreaded pool, zero-copy GPU memory transfers, asynchronous I/O scheduling, and CPU–GPU pipeline co-optimization. On ImageNet, SPDL achieves a 74% iteration speedup, reduces CPU utilization by 38%, and saves 50 GB of memory. It completely eliminates data starvation in ViT-B/16 training. Under Python 3.13, throughput further improves by 33%. SPDL establishes a scalable, high-performance, and cross-version-compatible paradigm for AI data loading.

Technology Category

Application Category

📝 Abstract

We present SPDL (Scalable and Performant Data Loading), an open-source, framework-agnostic library designed for efficiently loading array data to GPU. Data loading is often a bottleneck in AI applications, and is challenging to optimize because it requires coordination of network calls, CPU-bound tasks, and GPU device transfer. On top of that, Python's GIL (Global Interpreter Lock) makes it difficult to gain performance improvement from multi-threading. We found that when data preprocessing functions release the GIL entirely, it is possible to execute them concurrently in a thread pool, thereby improving the workflow performance. Our benchmark shows that compared to the PyTorch DataLoader, SPDL can iterate through the ImageNet dataset 74% faster while using 38% less CPU and 50GB less memory. When training ViT-B/16 model, SPDL can send data to the GPU at a speed that does not starve the training. Additionally, when using SPDL on Python 3.13t, without changing any code, the throughput is further by improved by 33%, thanks to the disabled GIL. SPDL can improve the performance of current AI model training, and receives further performance improvements when Free-Threaded Python is adopted in production systems. SPDL is available at https://github.com/facebookresearch/spdl.

Problem

Research questions and friction points this paper is trying to address.

Efficiently loading array data to GPU for AI applications

Overcoming Python's GIL limitations in multi-threaded data preprocessing

Reducing CPU and memory usage while accelerating data loading

Innovation

Methods, ideas, or system contributions that make the work stand out.

Framework-agnostic library for GPU array loading

Concurrent execution via GIL-free thread pool

Optimized network, CPU, and GPU coordination

🔎 Similar Papers

No similar papers found.