QM-ToT: A Medical Tree of Thoughts Reasoning Framework for Quantized Model

📅 2025-04-13

📈 Citations: 0

✨ Influential: 0

career value

176K/year

🤖 AI Summary

Quantized large language models (e.g., INT4) suffer severe performance degradation in complex biomedical reasoning under resource-constrained clinical settings. Method: This paper proposes QM-ToT (Quantized Medical Tree-of-Thought), the first ToT framework adapted to biomedicine, integrating quantization-aware path decomposition and a multi-level evaluation mechanism; it further introduces a ToT-guided knowledge distillation strategy requiring only 3.9% of training data. Results: On MedQA-USMLE, QM-ToT boosts accuracy of quantized LLaMA2-70B from 34% to 50% (+16 percentage points) and LLaMA-3.1-8B from 58.77% to 69.49% (+10.72 percentage points), representing relative improvements of 86.27% and 18.26%, respectively. The framework significantly enhances clinical reasoning capabilities of quantized models and empirically validates the feasibility of deploying high-accuracy medical LLMs on edge healthcare devices.

Technology Category

Application Category

📝 Abstract

Large language models (LLMs) face significant challenges in specialized biomedical tasks due to the inherent complexity of medical reasoning and the sensitive nature of clinical data. Existing LLMs often struggle with intricate medical terminology and the need for accurate clinical insights, leading to performance reduction when quantized for resource-constrained deployment. To address these issues, we propose Quantized Medical Tree of Thought (QM-ToT), a path-based reasoning framework. QM-ToT leverages a Tree of Thought (ToT) reasoning approach to decompose complex medical problems into manageable subtasks, coupled with evaluator assessment layers. This framework facilitates substantial performance improvements in INT4-quantized models on the challenging MedQAUSMLE dataset. Specifically, we demonstrate a remarkable accuracy increase from 34% to 50% for the LLaMA2-70b model and from 58.77% to 69.49% for LLaMA-3.1-8b. Besides, we also proposed an effect data distillation method based on ToT. Compared to the traditional distillation method, we achieved an improvement of 86. 27% while using only 3.9% of the data.This work, for the first time, showcases the potential of ToT to significantly enhance performance on complex biomedical tasks, establishing a crucial foundation for future advances in deploying high-performing quantized LLM in resource-limited medical settings.

Problem

Research questions and friction points this paper is trying to address.

Improves medical reasoning in quantized LLMs for complex biomedical tasks

Enhances accuracy of quantized models on clinical datasets like MedQAUSMLE

Proposes data-efficient distillation method for resource-constrained medical deployments

Innovation

Methods, ideas, or system contributions that make the work stand out.

Tree of Thought reasoning for medical subtasks

Evaluator assessment layers for accuracy

Data distillation method with minimal data

🔎 Similar Papers

Semantic Self-Consistency: Enhancing Language Model Reasoning via Semantic Weighting