Adacc: Adaptive Compression and Activation Checkpointing for LLM Memory Management

📅 2025-08-01

📈 Citations: 0

✨ Influential: 0

career value

207K/year

🤖 AI Summary

To address the high memory overhead (up to 30%) and performance bottlenecks caused by recomputation in large language model (LLM) training, this paper proposes AdaptCheck—a dynamic memory management framework integrating adaptive compression with activation checkpointing. Its core contributions are threefold: (1) inter-layer outlier-aware customized compression (FP16→INT4), (2) globally optimal checkpoint scheduling via mixed-integer linear programming (MILP), and (3) an online-evolving mechanism for dynamic policy adaptation during training. Evaluated under identical accuracy constraints, AdaptCheck achieves 1.01×–1.37× speedup over state-of-the-art methods while significantly reducing GPU memory consumption. The framework thus enables more efficient and scalable LLM training without compromising model fidelity.

Technology Category

Application Category

📝 Abstract

Training large language models often employs recomputation to alleviate memory pressure, which can introduce up to 30% overhead in real-world scenarios. In this paper, we propose Adacc, a novel memory management framework that combines adaptive compression and activation checkpointing to reduce the GPU memory footprint. It comprises three modules: (1) We design layer-specific compression algorithms that account for outliers in LLM tensors, instead of directly quantizing floats from FP16 to INT4, to ensure model accuracy. (2) We propose an optimal scheduling policy that employs MILP to determine the best memory optimization for each tensor. (3) To accommodate changes in training tensors, we introduce an adaptive policy evolution mechanism that adjusts the policy during training to enhance throughput. Experimental results show that Adacc can accelerate the LLM training by 1.01x to 1.37x compared to state-of-the-art frameworks, while maintaining comparable model accuracy to the Baseline.

Problem

Research questions and friction points this paper is trying to address.

Reducing GPU memory footprint in LLM training

Optimizing memory management via adaptive compression

Minimizing recomputation overhead while preserving accuracy

Innovation

Methods, ideas, or system contributions that make the work stand out.

Layer-specific compression for LLM tensor outliers

MILP-based optimal scheduling for tensor optimization

Adaptive policy evolution during training

🔎 Similar Papers

Ada-KV: Optimizing KV Cache Eviction by Adaptive Budget Allocation for Efficient LLM Inference