OralGPT-Plus: Learning to Use Visual Tools via Reinforcement Learning for Panoramic X-ray Analysis

📅 2026-03-06
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Current vision-language models lack fine-grained spatial reasoning, symmetry awareness, and multi-step diagnostic verification capabilities, limiting their clinical reliability in panoramic dental X-ray analysis. To address this, this work proposes the first dental imaging agent framework that supports interactive re-examination and symmetry-aware reasoning. The authors introduce DentalProbe, a novel dataset containing expert diagnostic trajectories, along with MMOral-X, a comprehensive evaluation benchmark. They further design a re-examination-guided reinforcement learning mechanism driven by rule-based and diagnosis-oriented rewards, enabling stable long-range reasoning through region-level annotations and contralateral comparisons. Experiments demonstrate that the proposed approach significantly outperforms strong baselines on both MMOral-X and existing benchmarks, substantially enhancing the reliability of clinical diagnosis.

Technology Category

Application Category

📝 Abstract
Panoramic dental radiographs require fine-grained spatial reasoning, bilateral symmetry understanding, and multi-step diagnostic verification, yet existing vision-language models operate under a static single-pass paradigm that limits their clinical reliability. In this paper, we introduce OralGPT-Plus, an agentic vision-language model designed to perform iterative and symmetry-aware diagnostic reasoning for panoramic dental radiograph analysis. To support this paradigm, we construct DentalProbe, a five-thousand-image dataset with expert-curated diagnostic trajectories that provide structured supervision for localized inspection and contralateral comparison. We further develop a Reinspection-driven reinforcement learning framework that encourages clinically meaningful re-examination and stabilizes long-horizon reasoning with rubric-based reward and conditioned diagnostic-driven reward. In parallel, we present MMOral-X, the first benchmark for holistic panoramic diagnosis, containing 300 open-ended questions and region-level annotations across multiple difficulty levels. OralGPT-Plus demonstrates consistent and reliable improvements over strong baselines on MMOral-X and established panoramic benchmarks, indicating the effectiveness of interactive and symmetry-informed reasoning. Our work highlights the value of agentic modeling for dental imaging and provides a foundation for future research in clinically aligned panoramic radiograph analysis.
Problem

Research questions and friction points this paper is trying to address.

panoramic X-ray analysis
vision-language models
clinical reliability
diagnostic reasoning
bilateral symmetry
Innovation

Methods, ideas, or system contributions that make the work stand out.

agentic vision-language model
reinspection-driven reinforcement learning
symmetry-aware reasoning
diagnostic trajectory
panoramic dental radiograph analysis
🔎 Similar Papers
No similar papers found.
Yuxuan Fan
Yuxuan Fan
Peking University
Natural Language Processing
J
Jing Hao
Faculty of Dentistry, The University of Hong Kong
Hong Chen
Hong Chen
The Hong Kong University of Science and Technology (Guangzhou)
Large Language ModelsMulti-modal LLMsEfficient LLMs
J
Jiahao Bao
Shanghai Jiao Tong University
Y
Yihua Shao
Institute of Automation, Chinese Academy of Sciences
Y
Yuci Liang
College of Computer Science and Software Engineering, Shenzhen University
K
Kuo Feng Hung
Faculty of Dentistry, The University of Hong Kong
Hao Tang
Hao Tang
Peking University
computer vision