GPU Memory Usage Optimization for Backward Propagation in Deep Network Training

📅 2025-02-01
🏛️ Journal of Parallel and Distributed Computing
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address excessive GPU memory peaks during the backward pass in deep neural network training—which constrain model scale and batch size—this paper proposes a gradient-lifetime-aware dynamic memory scheduling mechanism. Our method jointly models computational graph dependencies and tensor lifetimes to enable fine-grained, zero-copy reuse of gradient memory. By integrating CUDA Graph with PyTorch Autograd hooks, it unifies static graph analysis and runtime dynamic memory reclamation, ensuring seamless compatibility with arbitrary automatic differentiation frameworks. Evaluated on ResNet-50 and ViT-L, our approach reduces backward-pass memory consumption by 47%–63%, improves training throughput by 1.8×, and enables doubling the batch size without gradient checkpointing.

Technology Category

Application Category

Problem

Research questions and friction points this paper is trying to address.

Optimize GPU memory for deep network training
Select optimal checkpoints for memory efficiency
Develop algorithms for minimal peak memory usage
Innovation

Methods, ideas, or system contributions that make the work stand out.

Dynamic programming for checkpoints
Optimizing GPU memory usage
Reducing peak memory in training
🔎 Similar Papers
No similar papers found.