Mammo-SAE: Interpreting Breast Cancer Concept Learning with Sparse Autoencoders

📅 2025-07-20

📈 Citations: 0

✨ Influential: 0

career value

202K/year

🤖 AI Summary

In breast cancer imaging analysis, the “black-box” nature of vision-language foundation models (e.g., Mammo-CLIP) impedes clinical trust. To address this, we introduce sparse autoencoders (SAEs) into interpretability research for mammographic foundation models—proposing Mammo-SAE to achieve disentangled interpretation of Mammo-CLIP’s latent space. Using patch-level feature activation analysis and downstream probing, we precisely identify highly specific latent neurons encoding key clinical concepts—such as “mass” and “suspicious calcifications”—and empirically validate their spatial alignment with ground-truth lesion regions. Concurrently, we detect confounding factors influencing classification decisions, including acquisition-related artifacts and tissue overlap. Our approach significantly enhances decision transparency, verifiability, and clinical credibility. This work establishes a novel paradigm for interpretability research in medical foundation models, bridging the gap between representation learning and clinically grounded reasoning.

Technology Category

Application Category

📝 Abstract

Interpretability is critical in high-stakes domains such as medical imaging, where understanding model decisions is essential for clinical adoption. In this work, we introduce Sparse Autoencoder (SAE)-based interpretability to breast imaging by analyzing {Mammo-CLIP}, a vision--language foundation model pretrained on large-scale mammogram image--report pairs. We train a patch-level exttt{Mammo-SAE} on Mammo-CLIP to identify and probe latent features associated with clinically relevant breast concepts such as extit{mass} and extit{suspicious calcification}. Our findings reveal that top activated class level latent neurons in the SAE latent space often tend to align with ground truth regions, and also uncover several confounding factors influencing the model's decision-making process. Additionally, we analyze which latent neurons the model relies on during downstream finetuning for improving the breast concept prediction. This study highlights the promise of interpretable SAE latent representations in providing deeper insight into the internal workings of foundation models at every layer for breast imaging.

Problem

Research questions and friction points this paper is trying to address.

Interpreting breast cancer concepts in medical imaging models

Identifying latent features linked to clinical breast concepts

Analyzing model decision-making for breast concept prediction

Innovation

Methods, ideas, or system contributions that make the work stand out.

Sparse Autoencoder for breast imaging interpretability

Patch-level Mammo-SAE analyzes Mammo-CLIP features

Identifies latent neurons linked to clinical concepts

🔎 Similar Papers

No similar papers found.