Weight Group-wise Post-Training Quantization for Medical Foundation Model

📅 2026-04-08

📈 Citations: 0

✨ Influential: 0

career value

228K/year

🤖 AI Summary

This work addresses the challenge of efficiently deploying large, computationally intensive medical foundation models on edge healthcare devices. To this end, the authors propose Permutation-COMQ, a post-training quantization method that eliminates the need for backpropagation by simplifying the quantization process through dot-product and rounding operations. The approach innovatively introduces a weight-aware intra-layer permutation strategy that preserves channel structure while mitigating accuracy degradation under ultra-low-bit quantization. Notably, Permutation-COMQ requires no hyperparameter tuning and achieves state-of-the-art performance across 2-bit, 4-bit, and 8-bit settings, significantly enhancing both model accuracy and deployment efficiency at extremely low bitwidths.

Technology Category

Application Category

📝 Abstract

Foundation models have achieved remarkable results in medical image analysis. However, its large network architecture and high computational complexity significantly impact inference speed, limiting its application on terminal medical devices. Quantization, a technique that compresses models into low-bit versions, is a solution to this challenge. In this paper, we propose a post-training quantization algorithm, Permutation-COMQ. It eliminates the need for backpropagation by using simple dot products and rounding operations, thereby removing hyperparameter tuning and simplifying the process. Additionally, we introduce a weight-aware strategy that reorders the weight within each layer to address the accuracy degradation induced by channel-wise scaling during quantization, while preserving channel structure. Experiments demonstrate that our method achieves the best results in 2-bit, 4-bit, and 8-bit quantization.

Problem

Research questions and friction points this paper is trying to address.

Medical Foundation Model

Model Compression

Quantization

Inference Speed

Edge Deployment

Innovation

Methods, ideas, or system contributions that make the work stand out.

Post-Training Quantization

Weight Reordering

Channel-wise Scaling