ViM-VQ: Efficient Post-Training Vector Quantization for Visual Mamba

📅 2025-03-12

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Existing vector quantization (VQ) methods achieve 1–3-bit low-bit quantization on CNNs and Transformers but suffer severe accuracy degradation when directly applied to Vision Mamba (ViM). This stems from outlier-heavy weight distributions in Mamba blocks, which amplify quantization error, and from inherent limitations of mainstream VQ—namely high memory overhead, lengthy calibration, and inefficient codebook search. Method: We propose a training-free post-training quantization framework tailored for ViM. It introduces a novel convex-combination-based dynamic codebook search algorithm that efficiently identifies optimal codebooks under outlier-robust constraints; and an incremental vector quantization strategy to suppress truncation error accumulation. Contribution/Results: Our method achieves state-of-the-art accuracy at 1–3 bits across multiple vision tasks, significantly reduces memory footprint and inference latency, and enables the first efficient deployment of ViM on edge devices.

Technology Category

Application Category

📝 Abstract

Visual Mamba networks (ViMs) extend the selective space state model (Mamba) to various vision tasks and demonstrate significant potential. Vector quantization (VQ), on the other hand, decomposes network weights into codebooks and assignments, significantly reducing memory usage and computational latency to enable ViMs deployment on edge devices. Although existing VQ methods have achieved extremely low-bit quantization (e.g., 3-bit, 2-bit, and 1-bit) in convolutional neural networks and Transformer-based networks, directly applying these methods to ViMs results in unsatisfactory accuracy. We identify several key challenges: 1) The weights of Mamba-based blocks in ViMs contain numerous outliers, significantly amplifying quantization errors. 2) When applied to ViMs, the latest VQ methods suffer from excessive memory consumption, lengthy calibration procedures, and suboptimal performance in the search for optimal codewords. In this paper, we propose ViM-VQ, an efficient post-training vector quantization method tailored for ViMs. ViM-VQ consists of two innovative components: 1) a fast convex combination optimization algorithm that efficiently updates both the convex combinations and the convex hulls to search for optimal codewords, and 2) an incremental vector quantization strategy that incrementally confirms optimal codewords to mitigate truncation errors. Experimental results demonstrate that ViM-VQ achieves state-of-the-art performance in low-bit quantization across various visual tasks.

Problem

Research questions and friction points this paper is trying to address.

Addresses accuracy loss in low-bit quantization for Visual Mamba networks.

Reduces memory usage and computational latency for edge device deployment.

Overcomes challenges of outliers and inefficient codeword search in VQ methods.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Fast convex combination optimization algorithm

Incremental vector quantization strategy

Efficient post-training for Visual Mamba

🔎 Similar Papers

No similar papers found.

Authors to Follow