DentalGPT: Incentivizing Multimodal Complex Reasoning in Dentistry

📅 2025-12-12
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address two key bottlenecks of existing multimodal large language models (MLLMs) in dentistry—limited fine-grained visual understanding and insufficient clinical reasoning—this work introduces the first high-quality dental multimodal dataset comprising over 120,000 image-text pairs. We propose a domain-specific two-stage training paradigm: (1) knowledge-informed supervised fine-tuning for cross-modal alignment, followed by (2) clinical-feedback-driven reinforcement learning from human feedback (RLHF) to enhance diagnostic reasoning. The resulting lightweight (7B-parameter) dental MLLM achieves substantial improvements in fine-grained interpretation of panoramic radiographs and intraoral images, as well as in complex disease discrimination. It consistently outperforms general-purpose MLLMs—including LLaVA and Qwen-VL—on dental visual question answering (VQA) and disease classification benchmarks, setting new state-of-the-art performance across multiple metrics. This work establishes both a reusable methodology and a foundational benchmark resource for vertical-domain MLLM development.

Technology Category

Application Category

📝 Abstract
Reliable interpretation of multimodal data in dentistry is essential for automated oral healthcare, yet current multimodal large language models (MLLMs) struggle to capture fine-grained dental visual details and lack sufficient reasoning ability for precise diagnosis. To address these limitations, we present DentalGPT, a specialized dental MLLM developed through high-quality domain knowledge injection and reinforcement learning. Specifically, the largest annotated multimodal dataset for dentistry to date was constructed by aggregating over 120k dental images paired with detailed descriptions that highlight diagnostically relevant visual features, making it the multimodal dataset with the most extensive collection of dental images to date. Training on this dataset significantly enhances the MLLM's visual understanding of dental conditions, while the subsequent reinforcement learning stage further strengthens its capability for multimodal complex reasoning. Comprehensive evaluations on intraoral and panoramic benchmarks, along with dental subsets of medical VQA benchmarks, show that DentalGPT achieves superior performance in disease classification and dental VQA tasks, outperforming many state-of-the-art MLLMs despite having only 7B parameters. These results demonstrate that high-quality dental data combined with staged adaptation provides an effective pathway for building capable and domain-specialized dental MLLMs.
Problem

Research questions and friction points this paper is trying to address.

Develops a specialized dental MLLM for precise diagnosis
Enhances visual understanding and reasoning with dental data
Improves disease classification and VQA in dentistry benchmarks
Innovation

Methods, ideas, or system contributions that make the work stand out.

High-quality domain knowledge injection enhances visual understanding
Reinforcement learning strengthens multimodal complex reasoning capabilities
Staged adaptation with specialized data builds domain-specific MLLMs effectively
🔎 Similar Papers
No similar papers found.
Zhenyang Cai
Zhenyang Cai
The Chinese University of Hong Kong, Shenzhen
Large Language Models
J
Jiaming Zhang
The Chinese University of Hong Kong, Shenzhen
Junjie Zhao
Junjie Zhao
北京大学硕士生
CVML
Z
Ziyi Zeng
The Chinese University of Hong Kong, Shenzhen
Y
Yanchao Li
The Chinese University of Hong Kong, Shenzhen
J
Jingyi Liang
The Chinese University of Hong Kong, Shenzhen
J
Junying Chen
The Chinese University of Hong Kong, Shenzhen
Y
Yunjin Yang
The Chinese University of Hong Kong, Shenzhen
J
Jiajun You
The Chinese University of Hong Kong, Shenzhen; Freedom AI
S
Shuzhi Deng
Shenzhen Stomatology Hospital (Pingshan) of Southern Medical University
T
Tongfei Wang
Shenzhen Stomatology Hospital (Pingshan) of Southern Medical University
W
Wanting Chen
Shenzhen Stomatology Hospital (Pingshan) of Southern Medical University
C
Chunxiu Hao
Shenzhen Stomatology Hospital (Pingshan) of Southern Medical University
R
Ruiqi Xie
Shenzhen Stomatology Hospital (Pingshan) of Southern Medical University
Z
Zhenwei Wen
Division of Applied Oral Sciences & Community Dental Care, Faculty of Dentistry, The University of Hong Kong
X
Xiangyi Feng
Freedom AI
Z
Zou Ting
Shenzhen Stomatology Hospital (Pingshan) of Southern Medical University
J
Jin Zou Lin
Shenzhen Stomatology Hospital (Pingshan) of Southern Medical University
J
Jianquan Li
Freedom AI
G
Guangjun Yu
The Chinese University of Hong Kong, Shenzhen; National Health Data Institute, Shenzhen
L
Liangyi Chen
State Key Laboratory of Membrane Biology, Beijing Key Laboratory of Cardiometabolic Molecular Medicine, Institute of Molecular Medicine, National Biomedical Imaging Center, School of Future Technology, Peking University
Junwen Wang
Junwen Wang
Faculty of Dentistry, The University of Hong Kong
BioinformaticsComputational GenomicsSystems BiologyPrecision DentistryPrecision Medicine
S
Shan Jiang
Shenzhen Stomatology Hospital (Pingshan) of Southern Medical University
Benyou Wang
Benyou Wang
Assistant Professor, The Chinese University of Hong Kong, Shenzhen
large language modelsnatural language processinginformation retrievalapplied machine learning