MedSAE: Dissecting MedCLIP Representations with Sparse Autoencoders

📅 2025-10-30
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address semantic ambiguity and insufficient clinical interpretability in medical vision-language models (e.g., MedCLIP), this paper proposes the Medical Sparse Autoencoder (MedSAE), enabling fine-grained neuron-level analysis within the MedCLIP latent space trained on chest X-ray–report alignment. Methodologically, MedSAE integrates sparse coding, multimodal representation learning, and automated neuron naming guided by the MedGEMMA large language model, underpinned by a quantitative evaluation framework assessing three dimensions: neuron–concept correlation, activation entropy, and semantic consistency. Experiments on CheXpert demonstrate that MedSAE significantly improves neuronal mono-semantics (+32.7%) and clinical interpretability (expert interpretability scores ↑41.5%). It represents the first approach to achieve verifiable, nameable, and traceable decomposition of MedCLIP’s high-level representations—establishing a novel paradigm for enhancing trustworthiness and interpretability in medical AI.

Technology Category

Application Category

📝 Abstract
Artificial intelligence in healthcare requires models that are accurate and interpretable. We advance mechanistic interpretability in medical vision by applying Medical Sparse Autoencoders (MedSAEs) to the latent space of MedCLIP, a vision-language model trained on chest radiographs and reports. To quantify interpretability, we propose an evaluation framework that combines correlation metrics, entropy analyzes, and automated neuron naming via the MedGEMMA foundation model. Experiments on the CheXpert dataset show that MedSAE neurons achieve higher monosemanticity and interpretability than raw MedCLIP features. Our findings bridge high-performing medical AI and transparency, offering a scalable step toward clinically reliable representations.
Problem

Research questions and friction points this paper is trying to address.

Dissecting medical vision-language model representations for interpretability
Quantifying interpretability through correlation metrics and entropy analysis
Improving monosemanticity of medical AI features for clinical reliability
Innovation

Methods, ideas, or system contributions that make the work stand out.

Sparse Autoencoders analyze MedCLIP latent space
Evaluation framework combines metrics and neuron naming
Neurons achieve higher monosemanticity than raw features
🔎 Similar Papers
No similar papers found.