🤖 AI Summary
This study addresses the out-of-distribution (OOD) generalization challenge in multi-center mammography analysis. For the first time, it systematically evaluates invariant learning methods—Invariant Risk Minimization (IRM) and Variance Regularized Empirical Risk Minimization (VRE)—on a real-world, publicly available multi-site mammography dataset for breast cancer risk estimation. Methodologically, IRM and VRE are adapted to whole-image classification tasks and benchmarked against standard Empirical Risk Minimization (ERM); interpretability is enhanced via Class Activation Mapping (CAM) and representation visualization. Results demonstrate that invariant learning significantly improves cross-site AUC and mean precision, effectively mitigating spurious correlations; however, performance remains limited at sites with small sample sizes. This work establishes the first empirical OOD generalization benchmark for medical imaging, validating the clinical applicability of invariant learning while clarifying its mechanistic advantages and practical limitations.
📝 Abstract
Despite significant progress in robust deep learning techniques for mammogram breast cancer classification, their reliability in real-world clinical development settings remains uncertain. The translation of these models to clinical practice faces challenges due to variations in medical centers, imaging protocols, and patient populations. To enhance their robustness, invariant learning methods have been proposed, prioritizing causal factors over misleading features. However, their effectiveness in clinical development and impact on mammogram classification require investigation. This paper reassesses the application of invariant learning for breast cancer risk estimation based on mammograms. Utilizing diverse multi-site public datasets, it represents the first study in this area. The objective is to evaluate invariant learning's benefits in developing robust models. Invariant learning methods, including Invariant Risk Minimization and Variance Risk Extrapolation, are compared quantitatively against Empirical Risk Minimization. Evaluation metrics include accuracy, average precision, and area under the curve. Additionally, interpretability is examined through class activation maps and visualization of learned representations. This research examines the advantages, limitations, and challenges of invariant learning for mammogram classification, guiding future studies to develop generalized methods for breast cancer prediction on whole mammograms in out-of-domain scenarios.