Brand: Managing Training Data with Batched Random Access

📅 2025-05-22

📈 Citations: 0

✨ Influential: 0

career value

215K/year

🤖 AI Summary

To address the I/O bottleneck and compromised sample randomness arising when training datasets vastly exceed main memory capacity in deep learning, this paper proposes a chunk-based batched random-access memory management framework. Our method partitions data into fixed-size chunks and introduces a deterministic, chunk-level random sampling scheduling protocol—guaranteeing convergence while eliminating fine-grained read overhead. It further enables cooperative asynchronous prefetching across single-node and multi-node settings, overcoming inherent I/O limitations of conventional PyTorch DataLoaders. The lightweight runtime design ensures full backward compatibility with existing DataLoader APIs. Experiments demonstrate up to 4.57× end-to-end training speedup, 3.8× higher data loading throughput, significantly reduced GPU idle time, and validated scalability and effectiveness in large-scale distributed training on up to one thousand GPUs.

Technology Category

Application Category

📝 Abstract

This paper propose Brand, a comprehensive memory management system for deep learning training (DLT) where the memory capacity is much smaller than the size of the training datasets. Brand starts with a bold design choice that data files are always read from disk in batch, named chunk. Based on this assumption, we propose efficient data access protocol in both single-node setting and distributed environment with multiple nodes. The protocol minimizes the wasted data read due to larger granularity, enables efficient inter-node prefetching, while still ensuring randomness required by DLT. The experimental results indicate that Brand can significantly accelerate data fetching in DLT, achieving up to a 4.57x improvement in end-to-end training compared to PyTorch.

Problem

Research questions and friction points this paper is trying to address.

Manages limited memory for large training datasets

Optimizes batched disk reads for efficient data access

Ensures randomness while improving training speed significantly

Innovation

Methods, ideas, or system contributions that make the work stand out.

Batched random access for memory management

Efficient data protocol for distributed environments

Minimizes wasted reads while ensuring randomness

🔎 Similar Papers

No similar papers found.