Quantized Large Language Models in Biomedical Natural Language Processing: Evaluation and Recommendation

📅 2025-09-04
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Addressing the challenge of deploying large language models (LLMs) in privacy-sensitive, resource-constrained clinical settings—where prohibitive parameter counts, high computational overhead, and stringent data privacy requirements impede local adoption—this work systematically evaluates the quantization performance of 12 state-of-the-art LLMs across eight biomedical benchmarks. We propose a multi-task quantization evaluation framework encompassing named entity recognition, relation extraction, multi-label classification, and question answering, covering both general-purpose and domain-specific models. Experimental results demonstrate that optimal quantization reduces GPU memory consumption by up to 75% while preserving domain knowledge fidelity and task accuracy. Notably, we achieve efficient local inference for a 70B-parameter model on a single 40GB consumer-grade GPU. This study establishes a practical, deployable lightweighting paradigm for biomedical LLMs, significantly advancing the translational readiness of AI in clinical applications.

Technology Category

Application Category

📝 Abstract
Large language models have demonstrated remarkable capabilities in biomedical natural language processing, yet their rapid growth in size and computational requirements present a major barrier to adoption in healthcare settings where data privacy precludes cloud deployment and resources are limited. In this study, we systematically evaluated the impact of quantization on 12 state-of-the-art large language models, including both general-purpose and biomedical-specific models, across eight benchmark datasets covering four key tasks: named entity recognition, relation extraction, multi-label classification, and question answering. We show that quantization substantially reduces GPU memory requirements-by up to 75%-while preserving model performance across diverse tasks, enabling the deployment of 70B-parameter models on 40GB consumer-grade GPUs. In addition, domain-specific knowledge and responsiveness to advanced prompting methods are largely maintained. These findings provide significant practical and guiding value, highlighting quantization as a practical and effective strategy for enabling the secure, local deployment of large yet high-capacity language models in biomedical contexts, bridging the gap between technical advances in AI and real-world clinical translation.
Problem

Research questions and friction points this paper is trying to address.

Evaluating quantization impact on biomedical NLP models
Reducing GPU memory for healthcare deployment constraints
Maintaining performance in privacy-sensitive clinical settings
Innovation

Methods, ideas, or system contributions that make the work stand out.

Quantization reduces GPU memory by 75%
Enables 70B-parameter models on 40GB GPUs
Maintains performance across biomedical NLP tasks
🔎 Similar Papers
No similar papers found.
Zaifu Zhan
Zaifu Zhan
PhD at University of Minnesota, MS at Tsinghua University
Natural language processingMachine LearningAI for BiomedicineLarge Language model
S
Shuang Zhou
Division of Computational Health Sciences, Department of Surgery, University of Minnesota, Minneapolis, MN, USA
Min Zeng
Min Zeng
School of Computer Science and Engineering, Central South University
BioinformaticsMachine LearningDeep Learning
K
Kai Yu
Division of Computational Health Sciences, Department of Surgery, University of Minnesota, Minneapolis, MN, USA
Meijia Song
Meijia Song
University of Minnesota
Nursing InformaticsHealth Informatics
Xiaoyi Chen
Xiaoyi Chen
Indiana University Bloomington
machine learning securitybackdoor
J
Jun Wang
Division of Computational Health Sciences, Department of Surgery, University of Minnesota, Minneapolis, MN, USA
Y
Yu Hou
Division of Computational Health Sciences, Department of Surgery, University of Minnesota, Minneapolis, MN, USA
R
Rui Zhang
Division of Computational Health Sciences, Department of Surgery, University of Minnesota, Minneapolis, MN, USA