Unlocking Efficient Large Inference Models: One-Bit Unrolling Tips the Scales

📅 2025-02-04
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Large inference models (LIMs) suffer from high computational overhead and substantial parameter transmission costs. Method: This paper proposes a novel one-bit algorithm unrolling framework integrated with physical-world priors. It innovatively combines one-bit quantization with interpretable algorithm unrolling architectures, embeds physics-based constraints to enhance generalizability, and provides rigorous theoretical guarantees on convergence, stability, and generalization bounds. Results: Experiments demonstrate a parameter transmission rate of under 1.58 bits per connection, with simultaneous increases in network depth and compression ratio. Both training and test performance significantly outperform baselines, while network depth remains flexibly scalable. Crucially, this work achieves synergistic optimization of computational efficiency and model accuracy—without compromising theoretical rigor—thereby advancing the practical deployment of resource-constrained LIMs.

Technology Category

Application Category

📝 Abstract
Recent advancements in Large Language Model (LLM) compression, such as BitNet and BitNet b1.58, have marked significant strides in reducing the computational demands of LLMs through innovative one-bit quantization techniques. We extend this frontier by looking at Large Inference Models (LIMs) that have become indispensable across various applications. However, their scale and complexity often come at a significant computational cost. We introduce a novel approach that leverages one-bit algorithm unrolling, effectively integrating information from the physical world in the model architecture. Our method achieves a bit-per-link rate significantly lower than the 1.58 bits reported in prior work, thanks to the natural sparsity that emerges in our network architectures. We numerically demonstrate that the proposed one-bit algorithm unrolling scheme can improve both training and test outcomes by effortlessly increasing the number of layers while substantially compressing the network. Additionally, we provide theoretical results on the generalization gap, convergence rate, stability, and sensitivity of our proposed one-bit algorithm unrolling.
Problem

Research questions and friction points this paper is trying to address.

Reducing computational cost of Large Inference Models
Improving efficiency with one-bit algorithm unrolling
Enhancing model performance through network compression
Innovation

Methods, ideas, or system contributions that make the work stand out.

One-bit quantization reduces computational demands.
Algorithm unrolling integrates physical world information.
Sparse network architecture lowers bit-per-link rate.
🔎 Similar Papers
No similar papers found.