Enhancing Memory Efficiency in Large Language Model Training Through Chronos-aware Pipeline Parallelism

📅 2025-03-05

📈 Citations: 0

✨ Influential: 0

career value

211K/year

🤖 AI Summary

To address the HBM capacity bottleneck in large language model (LLM) pretraining, this paper proposes ChronosPipe—a framework that treats HBM as a high-speed cache and introduces the novel concept of Chronos-aware scheduling. ChronosPipe integrates three synergistic techniques: Chronos-Pipe (temporal-aware pipeline scheduling), Chronos-Recomp (activation recomputation guided by temporal locality), and Chronos-Offload (hierarchical offloading of weights and activations). By jointly modeling and exploiting the temporal locality of activations and weights, ChronosPipe enables fine-grained, HBM-aware memory scheduling. Evaluated under constant throughput constraints, it scales the trainable model size by 2.4× compared to baseline approaches and improves memory efficiency by 1.5× over conventional 1F1B with recomputation. The core contribution lies in formalizing temporal locality patterns in LLM training and leveraging them for hardware-aware memory orchestration—thereby significantly alleviating HBM pressure without sacrificing computational efficiency.

Technology Category

Application Category

📝 Abstract

Larger model sizes and longer sequence lengths have empowered the Large Language Model (LLM) to achieve outstanding performance across various domains. However, this progress brings significant storage capacity challenges for LLM pretraining. High Bandwidth Memory (HBM) is expensive and requires more advanced packaging technologies for capacity expansion, creating an urgent need for memory-efficient scheduling strategies. Yet, prior pipeline parallelism schedules have primarily focused on reducing bubble overhead, often neglecting memory efficiency and lacking compatibility with other memory-efficient strategies. Consequently, these methods struggle to meet the storage demands of storage capacity for next-generation LLM. This work presents ChronosPipe, a Chronos-aware pipeline parallelism for memory-efficient LLM pretraining. The core insight of ChronosPipe is to treat HBM as a fast but small 'cache,' optimizing and exploiting temporal locality within LLM pretraining to enhance HBM utilization. ChronosPipe introduces a pipeline scheduling strategy, Chronos-Pipe, to reduce the extrinsic overhead that disrupts the temporal locality of activations. Additionally, it leverages Chronos-Recomp and Chronos-Offload to efficiently harness the intrinsic temporal locality of activations and weights in Deep Neural Networks. Experiment results show that ChronosPipe can expand the trainable model size by 2.4x while maintaining comparable throughput, achieving 1.5x better than the 1F1B strategy combined with recomputation.

Problem

Research questions and friction points this paper is trying to address.

Addresses memory efficiency in large language model training.

Optimizes High Bandwidth Memory (HBM) utilization for LLM pretraining.

Introduces ChronosPipe to enhance model size scalability and throughput.

Innovation

Methods, ideas, or system contributions that make the work stand out.

ChronosPipe optimizes HBM as a fast cache

Chronos-Pipe reduces extrinsic overhead in scheduling

Chronos-Recomp and Chronos-Offload enhance temporal locality

🔎 Similar Papers

No similar papers found.