AMAQ: Adaptive Mixed-bit Activation Quantization for Collaborative Parameter Efficient Fine-tuning

๐Ÿ“… 2025-10-06
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
To address the high communication overhead and limited computational resources on edge devices in client-server collaborative fine-tuning of large language models (LLMs), this paper proposes an adaptive hybrid-bit activation quantization method. It introduces, for the first time, a dynamic bit-allocation mechanism guided by feature sensitivity and layer importance, jointly optimizing quantization precision for activations and gradientsโ€”thereby preventing representation collapse at ultra-low 3โ€“4-bit widths and significantly enhancing training stability. Integrated with a parameter-efficient split learning framework and bit-wise regularization, the approach enables progressive compression from 6โ€“8 bits to 3โ€“4 bits. Experiments on LLaMA3-8B and Qwen2.5-7B demonstrate a 2.5% gain in generation accuracy, a 1.3% improvement in classification accuracy, substantial reduction in communication volume, and preservation of superior inference accuracy.

Technology Category

Application Category

๐Ÿ“ Abstract
Large Language Models (LLMs) are scaling rapidly, creating significant challenges for collaborative server client distributed training, particularly in terms of communication efficiency and computational overheads. To address these challenges, we implement Parameter-efficient Split Learning, which effectively balances efficiency and performance for collaborative training on low-resource devices. To reduce communication overhead in collaborative training, we introduce Adaptive Mixed bit Activation Quantization (AMAQ), a strategy that progressively compresses activations and gradients from high precision (6 to 8 bits) to low precision (3 to 4 bits). AMAQ achieves this by effectively allocating bit budgets across channels based on feature wise and layer wise importance using bit regularization. Under the same bit budgets, AMAQ outperforms fixed-precision approaches, delivering about 2.5% higher generation accuracy and about 1.3% better classification accuracy for models like LLaMA3 8B and Qwen2.5 7B. In addition, it significantly enhances training stability and reducing ultra-low bit representation collapse during the training. Experiments demonstrate that AMAQ integrates effectively into practical multi-machine collaborative training setups, offering superior inference accuracy with only a modest communication overhead for bits adaptation during training. This trade off makes AMAQ a practical and effective solution for collaborative training with minimal communication cost.
Problem

Research questions and friction points this paper is trying to address.

Reducing communication overhead in distributed LLM training
Compressing activations and gradients adaptively across layers
Maintaining accuracy while minimizing collaborative training costs
Innovation

Methods, ideas, or system contributions that make the work stand out.

Adaptive Mixed-bit Activation Quantization for communication efficiency
Progressive compression from high to low bit precision
Bit budget allocation based on feature and layer importance
๐Ÿ”Ž Similar Papers
No similar papers found.