OralMLLM-Bench: Evaluating Cognitive Capabilities of Multimodal Large Language Models in Dental Practice

📅 2026-05-02
📈 Citations: 0
Influential: 0
📄 PDF

career value

191K/year
🤖 AI Summary
This study addresses the lack of systematic evaluation of multimodal large language models (MLLMs) in dental imaging analysis with respect to their hierarchical cognitive capabilities. We propose the first benchmark framework aligned with clinical dental reasoning, encompassing three imaging modalities—periapical radiographs, panoramic radiographs, and cephalometric radiographs—and four cognitive levels: perception, comprehension, prediction, and decision-making. The framework includes 27 clinical tasks, expert-annotated data, and 3,820 physician evaluations. Experiments with state-of-the-art MLLMs, including GPT-5.2 and GLM-4.6, quantitatively measure performance gaps relative to human experts, revealing critical limitations and failure modes in real-world diagnostic scenarios. These findings provide essential insights and actionable directions for developing safe, reliable AI systems in clinical dentistry.
📝 Abstract
Multimodal large language models (MLLMs) have emerged as a promising paradigm for dental image analysis. However, their ability to capture the multi-level cognitive processes required for radiographic analysis remains unclear. Here, we present a comprehensive benchmark to evaluate the cognitive capabilities of MLLMs in dental radiographic analysis. It spans three critical imaging modalities, i.e., periapical, panoramic, and lateral cephalometric radiographs, and defines four cognitive categories: perception, comprehension, prediction, and decision-making. The benchmark comprises 27 clinically grounded tasks derived from public datasets, with manually curated annotations and 3,820 clinician assessments for evaluation. Six frontier MLLMs, including GPT-5.2 and GLM-4.6, are evaluated. We demonstrate the performance gap between MLLMs and clinicians in dental practice, delineate model strengths and limitations, characterize failure patterns, and provide recommendations for improvement. This data resource will facilitate the development of next-generation artificial intelligence systems aligned with clinical cognition, safety requirements, and workflow complexity in dental practice.
Problem

Research questions and friction points this paper is trying to address.

Multimodal Large Language Models
Dental Radiographic Analysis
Cognitive Capabilities
Clinical Benchmarking
Artificial Intelligence in Dentistry
Innovation

Methods, ideas, or system contributions that make the work stand out.

multimodal large language models
dental radiographic analysis
cognitive benchmark
clinical evaluation
OralMLLM-Bench
🔎 Similar Papers
No similar papers found.
R
Rongyang Wang
Department of Orthodontics, Beijing Stomatological Hospital, School of Stomatology, Capital Medical University, Fanjiacun Road #9, Fengtai District, Beijing 100070, China
Shuang Zhou
Shuang Zhou
University of Minnesota, Hong Kong Polytechnic University
Biomedical InformaticsLarge Language ModelsAI for HealthcareElectronic Health Record
J
Jiashuo Wang
Department of Computing, The Hong Kong Polytechnic University, Hong Kong SAR, China
W
Wenya Xie
College of Science and Engineering, University of Minnesota, Minneapolis, MN, USA
X
Xiaoxia Che
Department of Orthodontics, Beijing Stomatological Hospital, School of Stomatology, Capital Medical University, Fanjiacun Road #9, Fengtai District, Beijing 100070, China