Rethinking 1-bit Optimization Leveraging Pre-trained Large Language Models

📅 2025-08-09
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing 1-bit quantization methods typically rely on from-scratch training, neglecting knowledge embedded in pretrained language models—leading to high computational costs and substantial accuracy degradation. This paper proposes a forward-and-backward consistent progressive binarization training framework, enabling high-performance 1-bit compression of large language models (LLMs) without requiring training from scratch. Its core contributions are: (1) binary-aware parameter initialization, which alleviates optimization difficulties induced by discrete constraints; and (2) a dual-scale compensation mechanism that explicitly models weight binarization bias separately in forward propagation and gradient backpropagation, enhancing both training stability and accuracy. The method supports end-to-end fine-tuning and is compatible with mainstream optimizers. Extensive experiments across multiple LLM scales demonstrate that the resulting 1-bit models closely match full-precision baselines in performance while drastically reducing memory footprint and computational overhead—establishing a new paradigm for efficient LLM deployment.

Technology Category

Application Category

📝 Abstract
1-bit LLM quantization offers significant advantages in reducing storage and computational costs. However, existing methods typically train 1-bit LLMs from scratch, failing to fully leverage pre-trained models. This results in high training costs and notable accuracy degradation. We identify that the large gap between full precision and 1-bit representations makes direct adaptation difficult. In this paper, we introduce a consistent progressive training for both forward and backward, smoothly converting the floating-point weights into the binarized ones. Additionally, we incorporate binary-aware initialization and dual-scaling compensation to reduce the difficulty of progressive training and improve the performance. Experimental results on LLMs of various sizes demonstrate that our method outperforms existing approaches. Our results show that high-performance 1-bit LLMs can be achieved using pre-trained models, eliminating the need for expensive training from scratch.
Problem

Research questions and friction points this paper is trying to address.

Reducing accuracy loss in 1-bit LLM quantization
Leveraging pre-trained models for efficient 1-bit conversion
Minimizing training costs for 1-bit LLM adaptation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Progressive training for weight binarization
Binary-aware initialization for easier adaptation
Dual-scaling compensation to enhance performance
🔎 Similar Papers
No similar papers found.