Subgroup Performance of a Commercial Digital Breast Tomosynthesis Model for Breast Cancer Detection

📅 2025-03-17
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Commercial digital breast tomosynthesis (DBT) AI models lack comprehensive real-world subgroup robustness evaluation, particularly across demographic, imaging, and pathological subgroups. Method: We conducted a large-scale, stratified binary classification assessment of the commercially deployed Lunit INSIGHT DBT model using 163,000 real-world DBT examinations. Performance was quantified via AUC, precision, recall, and 95% confidence intervals across prespecified subgroups—including age, breast density, calcification presence, and cancer subtype (especially non-invasive carcinoma). Contribution/Results: The model achieved an overall AUC of 0.91 but exhibited significant performance degradation in critical subgroups: non-invasive carcinoma (AUC = 0.85), calcified lesions (AUC = 0.80), and dense breasts (AUC = 0.90). This study identifies key performance boundaries of current clinical DBT AI systems, provides evidence-based guidance for deployment, and fills a critical gap in multidimensional subgroup validation of DBT AI models.

Technology Category

Application Category

📝 Abstract
While research has established the potential of AI models for mammography to improve breast cancer screening outcomes, there have not been any detailed subgroup evaluations performed to assess the strengths and weaknesses of commercial models for digital breast tomosynthesis (DBT) imaging. This study presents a granular evaluation of the Lunit INSIGHT DBT model on a large retrospective cohort of 163,449 screening mammography exams from the Emory Breast Imaging Dataset (EMBED). Model performance was evaluated in a binary context with various negative exam types (162,081 exams) compared against screen detected cancers (1,368 exams) as the positive class. The analysis was stratified across demographic, imaging, and pathologic subgroups to identify potential disparities. The model achieved an overall AUC of 0.91 (95% CI: 0.90-0.92) with a precision of 0.08 (95% CI: 0.08-0.08), and a recall of 0.73 (95% CI: 0.71-0.76). Performance was found to be robust across demographics, but cases with non-invasive cancers (AUC: 0.85, 95% CI: 0.83-0.87), calcifications (AUC: 0.80, 95% CI: 0.78-0.82), and dense breast tissue (AUC: 0.90, 95% CI: 0.88-0.91) were associated with significantly lower performance compared to other groups. These results highlight the need for detailed evaluation of model characteristics and vigilance in considering adoption of new tools for clinical deployment.
Problem

Research questions and friction points this paper is trying to address.

Evaluates AI model performance in breast cancer detection.
Assesses subgroup disparities in digital breast tomosynthesis.
Identifies lower performance in non-invasive cancers and dense tissue.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Evaluated Lunit INSIGHT DBT model performance
Analyzed 163,449 mammography exams retrospectively
Stratified analysis across demographic and imaging subgroups
🔎 Similar Papers
No similar papers found.
B
Beatrice Brown-Mulry
HITI Lab, Emory University, Atlanta, GA, USA
R
Rohan Isaac
HITI Lab, Emory University, Atlanta, GA, USA
S
Sang Hyup Lee
Lunit, Seoul, South Korea
A
Ambika Seth
Lunit, Seoul, South Korea
K
KyungJee Min
Lunit, Seoul, South Korea
T
T. Dapamede
HITI Lab, Emory University, Atlanta, GA, USA
F
Frank Li
HITI Lab, Emory University, Atlanta, GA, USA
A
Aawez Mansuri
HITI Lab, Emory University, Atlanta, GA, USA
MinJae Woo
MinJae Woo
Clemson University
Data Science
C
Christian Allison Fauria-Robinson
Emory University, Atlanta, GA, USA
B
Bhavna Paryani
Emory University, Atlanta, GA, USA
J
J. Gichoya
HITI Lab, Emory University, Atlanta, GA, USA
Hari Trivedi
Hari Trivedi
Emory University
Deep LearningRadiologyMammographyAINatural Language Processing