DinoDental: Benchmarking DINOv3 as a Unified Vision Encoder for Dental Image Analysis

📅 2026-03-30

📈 Citations: 0

✨ Influential: 0

career value

185K/year

🤖 AI Summary

This study addresses the scarcity and high cost of annotated data in dental imaging by systematically evaluating the transferability of the general-purpose vision foundation model DINOv3 to multimodal dental images, including panoramic radiographs and intraoral photographs. The authors introduce DinoDental, the first unified benchmark for dental AI, and assess DINOv3’s performance across classification, detection, and instance segmentation tasks using various adaptation strategies—namely frozen feature extraction, full fine-tuning, and low-rank adaptation (LoRA)—without any domain-specific pretraining. Experimental results demonstrate that DINOv3 excels particularly in intraoral image understanding and boundary-sensitive dense prediction tasks, establishing it as a reliable and efficient model choice for dental AI applications.

Technology Category

Application Category

📝 Abstract

The scarcity and high cost of expert annotations in dental imaging present a significant challenge for the development of AI in dentistry. DINOv3, a state-of-the-art, self-supervised vision foundation model pre-trained on 1.7 billion images, offers a promising pathway to mitigate this issue. However, its reliability when transferred to the dental domain, with its unique imaging characteristics and clinical subtleties, remains unclear. To address this, we introduce DinoDental, a unified benchmark designed to systematically evaluate whether DINOv3 can serve as a reliable, off-the-shelf encoder for comprehensive dental image analysis without requiring domain-specific pre-training. Constructed from multiple public datasets, DinoDental covers a wide range of tasks, including classification, detection, and instance segmentation on both panoramic radiographs and intraoral photographs. We further analyze the model's transfer performance by scaling its size and input resolution, and by comparing different adaptation strategies, including frozen features, full fine-tuning, and the parameter-efficient Low-Rank Adaptation (LoRA) method. Our experiments show that DINOv3 can serve as a strong unified encoder for dental image analysis across both panoramic radiographs and intraoral photographs, remaining competitive across tasks while showing particularly clear advantages for intraoral image understanding and boundary-sensitive dense prediction. Collectively, DinoDental provides a systematic framework for comprehensively evaluating DINOv3 in dental analysis, establishing a foundational benchmark to guide efficient and effective model selection and adaptation for the dental AI community.

Problem

Research questions and friction points this paper is trying to address.

dental image analysis

vision foundation model

self-supervised learning

domain transfer

benchmarking

Innovation

Methods, ideas, or system contributions that make the work stand out.

DINOv3

self-supervised learning

dental image analysis