DinoDental: Benchmarking DINOv3 as a Unified Vision Encoder for Dental Image Analysis

📅 2026-03-30
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the scarcity and high cost of annotated data in dental imaging by systematically evaluating the transferability of the general-purpose vision foundation model DINOv3 to multimodal dental images, including panoramic radiographs and intraoral photographs. The authors introduce DinoDental, the first unified benchmark for dental AI, and assess DINOv3’s performance across classification, detection, and instance segmentation tasks using various adaptation strategies—namely frozen feature extraction, full fine-tuning, and low-rank adaptation (LoRA)—without any domain-specific pretraining. Experimental results demonstrate that DINOv3 excels particularly in intraoral image understanding and boundary-sensitive dense prediction tasks, establishing it as a reliable and efficient model choice for dental AI applications.
📝 Abstract
The scarcity and high cost of expert annotations in dental imaging present a significant challenge for the development of AI in dentistry. DINOv3, a state-of-the-art, self-supervised vision foundation model pre-trained on 1.7 billion images, offers a promising pathway to mitigate this issue. However, its reliability when transferred to the dental domain, with its unique imaging characteristics and clinical subtleties, remains unclear. To address this, we introduce DinoDental, a unified benchmark designed to systematically evaluate whether DINOv3 can serve as a reliable, off-the-shelf encoder for comprehensive dental image analysis without requiring domain-specific pre-training. Constructed from multiple public datasets, DinoDental covers a wide range of tasks, including classification, detection, and instance segmentation on both panoramic radiographs and intraoral photographs. We further analyze the model's transfer performance by scaling its size and input resolution, and by comparing different adaptation strategies, including frozen features, full fine-tuning, and the parameter-efficient Low-Rank Adaptation (LoRA) method. Our experiments show that DINOv3 can serve as a strong unified encoder for dental image analysis across both panoramic radiographs and intraoral photographs, remaining competitive across tasks while showing particularly clear advantages for intraoral image understanding and boundary-sensitive dense prediction. Collectively, DinoDental provides a systematic framework for comprehensively evaluating DINOv3 in dental analysis, establishing a foundational benchmark to guide efficient and effective model selection and adaptation for the dental AI community.
Problem

Research questions and friction points this paper is trying to address.

dental image analysis
vision foundation model
self-supervised learning
domain transfer
benchmarking
Innovation

Methods, ideas, or system contributions that make the work stand out.

DINOv3
self-supervised learning
dental image analysis
foundation model transfer
parameter-efficient adaptation
🔎 Similar Papers
No similar papers found.
K
Kun Tang
Shenzhen University
X
Xinquan Yang
Shenzhen University
M
Mianjie Zheng
Shenzhen University
X
Xuefen Liu
Shenzhen University
Xuguang Li
Xuguang Li
Information management school, Shandong University of Technologynkai University
information and knowledge managementsocial mediaknowledge innovation
X
Xiaoqi Guo
Shenzhen University
R
Ruihan Chen
Chongqing University
Linlin Shen
Linlin Shen
Shenzhen University
Deep LearningComputer VisionFacial Analysis/RecognitionMedical Image Analysis
H
He Meng
Shenzhen University General Hospital