Sparse Autoencoders for Interpretable Medical Image Representation Learning

📅 2026-03-24
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the limited interpretability of latent representations in medical vision foundation models, which hinders their clinical adoption. We introduce sparse autoencoders (SAEs) on large-scale medical imaging data to transform embeddings from foundation models such as DINOv3 into sparse, language-describable features that preserve semantic fidelity. This approach substantially enhances model transparency and clinical trustworthiness. Remarkably, only ten sparse features—achieving a 99.4% dimensionality reduction—recover 87.8% of downstream task performance, with an embedding reconstruction R² of 0.941. The resulting representation further enables zero-shot, semantically aligned image retrieval. By integrating BiomedParse and large language models for automated feature interpretation, our method achieves both efficient compression of high-dimensional embeddings and human-understandable semantic mapping.

Technology Category

Application Category

📝 Abstract
Vision foundation models (FMs) achieve state-of-the-art performance in medical imaging. However, they encode information in abstract latent representations that clinicians cannot interrogate or verify. The goal of this study is to investigate Sparse Autoencoders (SAEs) for replacing opaque FM image representations with human-interpretable, sparse features. We train SAEs on embeddings from BiomedParse (biomedical) and DINOv3 (general-purpose) using 909,873 CT and MRI 2D image slices from the TotalSegmentator dataset. We find that learned sparse features: (a) reconstruct original embeddings with high fidelity (R2 up to 0.941) and recover up to 87.8% of downstream performance using only 10 features (99.4% dimensionality reduction), (b) preserve semantic fidelity in image retrieval tasks, (c) correspond to specific concepts that can be expressed in language using large language model (LLM)-based auto-interpretation. (d) bridge clinical language and abstract latent representations in zero-shot language-driven image retrieval. Our work indicates SAEs are a promising pathway towards interpretable, concept-driven medical vision systems. Code repository: https://github.com/pwesp/sail.
Problem

Research questions and friction points this paper is trying to address.

interpretable representation
medical image analysis
vision foundation models
sparse autoencoders
clinical interpretability
Innovation

Methods, ideas, or system contributions that make the work stand out.

Sparse Autoencoders
Interpretable Representation
Medical Vision Foundation Models
LLM-based Interpretation
Zero-shot Image Retrieval
🔎 Similar Papers
No similar papers found.
P
Philipp Wesp
Stanford Center for Artificial Intelligence in Medicine and Imaging, Stanford University, Stanford, CA, USA
R
Robbie Holland
Stanford Center for Artificial Intelligence in Medicine and Imaging, Stanford University, Stanford, CA, USA
Vasiliki Sideri-Lampretsa
Vasiliki Sideri-Lampretsa
Doctoral Student, Technical University of Munich
Medical imagingAI in medicineImage registrationComputer Vision
Sergios Gatidis
Sergios Gatidis
Stanford Medicine
Healthcare AIMedical Image and Data AnalysisPediatric RadiologyHybrid Imaging