Towards Better Dental AI: A Multimodal Benchmark and Instruction Dataset for Panoramic X-ray Analysis

📅 2025-09-11
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing large vision-language models (LVLMs) perform well on general medical tasks but exhibit significant limitations in dental panoramic radiograph analysis due to the lack of domain-specific multimodal data and standardized evaluation benchmarks tailored to dense anatomical structures and subtle pathological features. To address this gap, we introduce MMOral—the first multimodal instruction-tuning dataset for oral imaging—comprising 20,563 panoramic X-ray images and 1.3 million expert-curated instructions—alongside MMOral-Bench, a comprehensive evaluation benchmark. We further propose OralGPT, a lightweight fine-tuned variant of Qwen2.5-VL-7B, achieving a 24.73% improvement over the baseline after single-round supervised fine-tuning. Notably, GPT-4o scores only 41.45% on MMOral-Bench, underscoring the task’s difficulty and the necessity of domain adaptation. This work establishes the first reproducible foundation for dental multimodal AI, providing a benchmarked dataset, rigorous evaluation protocol, and effective methodology for specialized medical LVM development.

Technology Category

Application Category

📝 Abstract
Recent advances in large vision-language models (LVLMs) have demonstrated strong performance on general-purpose medical tasks. However, their effectiveness in specialized domains such as dentistry remains underexplored. In particular, panoramic X-rays, a widely used imaging modality in oral radiology, pose interpretative challenges due to dense anatomical structures and subtle pathological cues, which are not captured by existing medical benchmarks or instruction datasets. To this end, we introduce MMOral, the first large-scale multimodal instruction dataset and benchmark tailored for panoramic X-ray interpretation. MMOral consists of 20,563 annotated images paired with 1.3 million instruction-following instances across diverse task types, including attribute extraction, report generation, visual question answering, and image-grounded dialogue. In addition, we present MMOral-Bench, a comprehensive evaluation suite covering five key diagnostic dimensions in dentistry. We evaluate 64 LVLMs on MMOral-Bench and find that even the best-performing model, i.e., GPT-4o, only achieves 41.45% accuracy, revealing significant limitations of current models in this domain. To promote the progress of this specific domain, we also propose OralGPT, which conducts supervised fine-tuning (SFT) upon Qwen2.5-VL-7B with our meticulously curated MMOral instruction dataset. Remarkably, a single epoch of SFT yields substantial performance enhancements for LVLMs, e.g., OralGPT demonstrates a 24.73% improvement. Both MMOral and OralGPT hold significant potential as a critical foundation for intelligent dentistry and enable more clinically impactful multimodal AI systems in the dental field. The dataset, model, benchmark, and evaluation suite are available at https://github.com/isbrycee/OralGPT.
Problem

Research questions and friction points this paper is trying to address.

Addressing limited LVLM effectiveness in specialized dental domain applications
Overcoming interpretative challenges of panoramic X-rays with dense anatomical structures
Filling gaps in existing medical benchmarks for dental X-ray analysis
Innovation

Methods, ideas, or system contributions that make the work stand out.

Created multimodal instruction dataset for panoramic X-rays
Developed comprehensive benchmark suite for dental diagnostics
Proposed OralGPT model with supervised fine-tuning approach
🔎 Similar Papers
No similar papers found.
J
Jing Hao
Faculty of Dentistry, The University of Hong Kong
Yuxuan Fan
Yuxuan Fan
Peking University
Natural Language Processing
Yanpeng Sun
Yanpeng Sun
Nanjing University of Science and Technology
Computer visionDeep LearningMultimedia
K
Kaixin Guo
Faculty of Dentistry, The University of Hong Kong
L
Lizhuo Lin
Faculty of Dentistry, The University of Hong Kong
J
Jinrong Yang
CVTE, Sun Yat-sen University
Q
Qi Yong H. Ai
Department of Diagnostic Radiology, The University of Hong Kong
L
Lun M. Wong
Imaging and Interventional Radiology, Faculty of Medicine, The Chinese University of Hong Kong
H
Hao Tang
School of Computer Science, Peking University
K
Kuo Feng Hung
Faculty of Dentistry, The University of Hong Kong