Q-SAM2: Accurate Quantization for Segment Anything Model 2

📅 2025-06-11
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
SAM2 faces deployment bottlenecks in resource-constrained scenarios due to excessive computational and memory overhead. To address this, we propose Q-SAM2—a high-accuracy low-bit quantization framework—introducing two key innovations: (1) linear-layer calibration initialization and (2) quantization-aware training (QAT) integrated with dynamic-threshold pruning, effectively mitigating accuracy collapse caused by weight/activation distribution singularities under extreme quantization. Q-SAM2 synergistically combines Frobenius-norm-minimizing calibration, QAT, structured pruning, and post-training adaptation. Remarkably, it achieves breakthrough performance at ultra-low 2-bit precision. Experiments demonstrate that Q-SAM2 improves mean Intersection-over-Union (mIoU) by up to 66% over state-of-the-art general-purpose quantization methods. Both visual quality and quantitative metrics surpass existing approaches, establishing new SOTA while ensuring high accuracy, computational efficiency, and practical deployability.

Technology Category

Application Category

📝 Abstract
The Segment Anything Model 2 (SAM2) has gained significant attention as a foundational approach for promptable image and video segmentation. However, its expensive computational and memory consumption poses a severe challenge for its application in resource-constrained scenarios. In this paper, we propose an accurate low-bit quantization method for efficient SAM2, termed Q-SAM2. To address the performance degradation caused by the singularities in weight and activation distributions during quantization, Q-SAM2 introduces two novel technical contributions. We first introduce a linear layer calibration method for low-bit initialization of SAM2, which minimizes the Frobenius norm over a small image batch to reposition weight distributions for improved quantization. We then propose a Quantization-Aware Training (QAT) pipeline that applies clipping to suppress outliers and allows the network to adapt to quantization thresholds during training. Our comprehensive experiments demonstrate that Q-SAM2 allows for highly accurate inference while substantially improving efficiency. Both quantitative and visual results show that our Q-SAM2 surpasses existing state-of-the-art general quantization schemes, especially for ultra-low 2-bit quantization. While designed for quantization-aware training, our proposed calibration technique also proves effective in post-training quantization, achieving up to a 66% mIoU accuracy improvement over non-calibrated models.
Problem

Research questions and friction points this paper is trying to address.

Reduces computational and memory costs of SAM2 for resource-constrained scenarios
Addresses performance degradation from weight and activation distribution singularities
Improves accuracy in ultra-low 2-bit quantization for image segmentation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Linear layer calibration for low-bit initialization
Quantization-Aware Training with outlier suppression
Effective post-training quantization with calibration