Adacc: Adaptive Compression and Activation Checkpointing for LLM Memory Management

๐Ÿ“… 2025-08-01
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
To address the high memory overhead (up to 30%) and performance bottlenecks caused by recomputation in large language model (LLM) training, this paper proposes AdaptCheckโ€”a dynamic memory management framework integrating adaptive compression with activation checkpointing. Its core contributions are threefold: (1) inter-layer outlier-aware customized compression (FP16โ†’INT4), (2) globally optimal checkpoint scheduling via mixed-integer linear programming (MILP), and (3) an online-evolving mechanism for dynamic policy adaptation during training. Evaluated under identical accuracy constraints, AdaptCheck achieves 1.01ร—โ€“1.37ร— speedup over state-of-the-art methods while significantly reducing GPU memory consumption. The framework thus enables more efficient and scalable LLM training without compromising model fidelity.

Technology Category

Application Category

๐Ÿ“ Abstract
Training large language models often employs recomputation to alleviate memory pressure, which can introduce up to 30% overhead in real-world scenarios. In this paper, we propose Adacc, a novel memory management framework that combines adaptive compression and activation checkpointing to reduce the GPU memory footprint. It comprises three modules: (1) We design layer-specific compression algorithms that account for outliers in LLM tensors, instead of directly quantizing floats from FP16 to INT4, to ensure model accuracy. (2) We propose an optimal scheduling policy that employs MILP to determine the best memory optimization for each tensor. (3) To accommodate changes in training tensors, we introduce an adaptive policy evolution mechanism that adjusts the policy during training to enhance throughput. Experimental results show that Adacc can accelerate the LLM training by 1.01x to 1.37x compared to state-of-the-art frameworks, while maintaining comparable model accuracy to the Baseline.
Problem

Research questions and friction points this paper is trying to address.

Reducing GPU memory footprint in LLM training
Optimizing memory management via adaptive compression
Minimizing recomputation overhead while preserving accuracy
Innovation

Methods, ideas, or system contributions that make the work stand out.

Layer-specific compression for LLM tensor outliers
MILP-based optimal scheduling for tensor optimization
Adaptive policy evolution during training