Preserving LLM Capabilities through Calibration Data Curation: From Analysis to Optimization

📅 2025-10-12
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Post-training compression of large language models (LLMs) often degrades complex reasoning capabilities—such as mathematical problem solving and code generation—due to suboptimal selection of calibration data. Method: This work is the first to systematically analyze, from the perspective of neuron activation patterns, how the representativeness and diversity of calibration data affect multi-dimensional model capabilities. We propose an activation-space-optimized calibration data construction framework that integrates activation-based clustering for sample selection and diversity-enhancement techniques, and jointly validate it with pruning and quantization. Contribution/Results: Experiments demonstrate that our method significantly mitigates performance degradation on complex reasoning and generative tasks after compression, effectively preserving the original model’s capabilities. The implementation is publicly available.

Technology Category

Application Category

📝 Abstract
Post-training compression has been a widely employed approach to scale down large language model (LLM) and facilitate efficient inference. In various proposed compression methods, including pruning and quantization, calibration data plays a vital role by informing the weight importance and activation dynamic ranges. However, how calibration data impacts the LLM capability after compression is less explored. Few of the existing works, though recognizing the significance of this study, only investigate the language modeling or commonsense reasoning performance degradation from limited angles, like the data sources or sample amounts. More systematic research is still needed to examine the impacts on different LLM capabilities in terms of compositional properties and domain correspondence of calibration data. In this work, we aim at bridging this gap and further analyze underlying influencing mechanisms from the activation pattern perspective. Especially, we explore the calibration data's impacts on high-level complex reasoning capabilities, like math problem solving and code generation. Delving into the underlying mechanism, we find that the representativeness and diversity in activation space more fundamentally determine the quality of calibration data. Finally, we propose a calibration data curation framework based on such observations and analysis, enhancing the performance of existing post-training compression methods on preserving critical LLM capabilities. Our code is provided in href{https://github.com/BokwaiHo/COLA.git}{Link}.
Problem

Research questions and friction points this paper is trying to address.

How calibration data affects LLM capabilities after compression
Systematically examining impacts on reasoning and domain correspondence
Developing optimization framework to preserve critical LLM capabilities
Innovation

Methods, ideas, or system contributions that make the work stand out.

Calibration data curation framework preserves LLM capabilities
Analyzes activation patterns for data representativeness and diversity
Optimizes post-training compression methods through systematic analysis
🔎 Similar Papers
No similar papers found.
Bowei He
Bowei He
City University of Hong Kong, MBZUAI
Data MiningLanguage ModelGenAI4ScienceAgentic AI
L
Lihao Yin
Huawei, Hong Kong
H
Huiling Zhen
Huawei, Hong Kong
S
Shuqi Liu
Huawei, Hong Kong
H
Han Wu
Huawei, Hong Kong
Xiaokun Zhang
Xiaokun Zhang
City University of Hong Kong, Dalian University of Technology
Data miningRecommendationNLP
M
Mingxuan Yuan
Huawei, Hong Kong
C
Chen Ma
Department of Computer Science, City University of Hong Kong