🤖 AI Summary
To address the lack of dedicated foundation models for digital breast tomosynthesis (DBT), this paper introduces DBT-DINO—the first 3D self-supervised foundation model tailored for DBT. Leveraging 487,975 real-world DBT volumetric scans (>25 million slices), it employs a 3D-adapted DINOv2 framework for in-domain pretraining. The model delivers unified performance across three clinical tasks: breast density classification (accuracy = 0.79; *p* < 0.001 superior to baseline), 5-year breast cancer risk prediction (AUROC = 0.78), and malignant lesion detection (sensitivity = 78.8%). Critically, this work provides the first empirical evidence that in-domain pretraining yields substantial gains for high-level semantic tasks—particularly risk prediction—yet offers diminishing returns for fine-grained detection, highlighting a key limitation and guiding future architectural or training-strategy improvements.
📝 Abstract
Foundation models have shown promise in medical imaging but remain underexplored for three-dimensional imaging modalities. No foundation model currently exists for Digital Breast Tomosynthesis (DBT), despite its use for breast cancer screening.
To develop and evaluate a foundation model for DBT (DBT-DINO) across multiple clinical tasks and assess the impact of domain-specific pre-training.
Self-supervised pre-training was performed using the DINOv2 methodology on over 25 million 2D slices from 487,975 DBT volumes from 27,990 patients. Three downstream tasks were evaluated: (1) breast density classification using 5,000 screening exams; (2) 5-year risk of developing breast cancer using 106,417 screening exams; and (3) lesion detection using 393 annotated volumes.
For breast density classification, DBT-DINO achieved an accuracy of 0.79 (95% CI: 0.76--0.81), outperforming both the MetaAI DINOv2 baseline (0.73, 95% CI: 0.70--0.76, p<.001) and DenseNet-121 (0.74, 95% CI: 0.71--0.76, p<.001). For 5-year breast cancer risk prediction, DBT-DINO achieved an AUROC of 0.78 (95% CI: 0.76--0.80) compared to DINOv2's 0.76 (95% CI: 0.74--0.78, p=.57). For lesion detection, DINOv2 achieved a higher average sensitivity of 0.67 (95% CI: 0.60--0.74) compared to DBT-DINO with 0.62 (95% CI: 0.53--0.71, p=.60). DBT-DINO demonstrated better performance on cancerous lesions specifically with a detection rate of 78.8% compared to Dinov2's 77.3%.
Using a dataset of unprecedented size, we developed DBT-DINO, the first foundation model for DBT. DBT-DINO demonstrated strong performance on breast density classification and cancer risk prediction. However, domain-specific pre-training showed variable benefits on the detection task, with ImageNet baseline outperforming DBT-DINO on general lesion detection, indicating that localized detection tasks require further methodological development.