π€ AI Summary
Radiomics AI models for breast cancer diagnosis suffer from limited interpretability and poor alignment with clinical standards (e.g., BI-RADS).
Method: We propose BM1.0, a novel dual-dictionary framework that establishes the first semantic mapping between radiomic features (RFs) and BI-RADS lexicon terms. Integrating clinical prior knowledge with SHAP-driven data-driven discovery, BM1.0 leverages dynamic contrast-enhanced MRI to systematically evaluate 27 machine learning models and feature selection strategies, incorporating variance inflation factor (VIF) analysis to enhance feature stability.
Results: The optimal VIF-regularized Extra Trees model achieves 0.83 cross-validated accuracy. It identifies interpretable RFsβsuch as *Sphericity* and *Busyness*βwith explicit BI-RADS semantic correspondence, validating established imaging biomarkers and uncovering potential TNBC-specific markers. This significantly improves clinical trustworthiness and decision transparency of radiomics models.
π Abstract
Radiomics-based AI models show promise for breast cancer diagnosis but often lack interpretability, limiting clinical adoption. This study addresses the gap between radiomic features (RF) and the standardized BI-RADS lexicon by proposing a dual-dictionary framework. First, a Clinically-Informed Feature Interpretation Dictionary (CIFID) was created by mapping 56 RFs to BI-RADS descriptors (shape, margin, internal enhancement) through literature and expert review. The framework was applied to classify triple-negative breast cancer (TNBC) versus non-TNBC using dynamic contrast-enhanced MRI from a multi-institutional cohort of 1,549 patients. We trained 27 machine learning classifiers with 27 feature selection methods. SHapley Additive exPlanations (SHAP) were used to interpret predictions and generate a complementary Data-Driven Feature Interpretation Dictionary (DDFID) for 52 additional RFs. The best model, combining Variance Inflation Factor (VIF) selection with Extra Trees Classifier, achieved an average cross-validation accuracy of 0.83. Key predictive RFs aligned with clinical knowledge: higher Sphericity (round/oval shape) and lower Busyness (more homogeneous enhancement) were associated with TNBC. The framework confirmed known imaging biomarkers and uncovered novel, interpretable associations. This dual-dictionary approach (BM1.0) enhances AI model transparency and supports the integration of RFs into routine breast cancer diagnosis and personalized care.