Mammo-SAE: Interpreting Breast Cancer Concept Learning with Sparse Autoencoders

📅 2025-07-20
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
In breast cancer imaging analysis, the “black-box” nature of vision-language foundation models (e.g., Mammo-CLIP) impedes clinical trust. To address this, we introduce sparse autoencoders (SAEs) into interpretability research for mammographic foundation models—proposing Mammo-SAE to achieve disentangled interpretation of Mammo-CLIP’s latent space. Using patch-level feature activation analysis and downstream probing, we precisely identify highly specific latent neurons encoding key clinical concepts—such as “mass” and “suspicious calcifications”—and empirically validate their spatial alignment with ground-truth lesion regions. Concurrently, we detect confounding factors influencing classification decisions, including acquisition-related artifacts and tissue overlap. Our approach significantly enhances decision transparency, verifiability, and clinical credibility. This work establishes a novel paradigm for interpretability research in medical foundation models, bridging the gap between representation learning and clinically grounded reasoning.

Technology Category

Application Category

📝 Abstract
Interpretability is critical in high-stakes domains such as medical imaging, where understanding model decisions is essential for clinical adoption. In this work, we introduce Sparse Autoencoder (SAE)-based interpretability to breast imaging by analyzing {Mammo-CLIP}, a vision--language foundation model pretrained on large-scale mammogram image--report pairs. We train a patch-level exttt{Mammo-SAE} on Mammo-CLIP to identify and probe latent features associated with clinically relevant breast concepts such as extit{mass} and extit{suspicious calcification}. Our findings reveal that top activated class level latent neurons in the SAE latent space often tend to align with ground truth regions, and also uncover several confounding factors influencing the model's decision-making process. Additionally, we analyze which latent neurons the model relies on during downstream finetuning for improving the breast concept prediction. This study highlights the promise of interpretable SAE latent representations in providing deeper insight into the internal workings of foundation models at every layer for breast imaging.
Problem

Research questions and friction points this paper is trying to address.

Interpreting breast cancer concepts in medical imaging models
Identifying latent features linked to clinical breast concepts
Analyzing model decision-making for breast concept prediction
Innovation

Methods, ideas, or system contributions that make the work stand out.

Sparse Autoencoder for breast imaging interpretability
Patch-level Mammo-SAE analyzes Mammo-CLIP features
Identifies latent neurons linked to clinical concepts
🔎 Similar Papers
No similar papers found.