CytoSAE: Interpretable Cell Embeddings for Hematology

📅 2025-07-16
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Medical foundation models for hematological imaging lack interpretable tools—particularly for identifying pathological cells and subcellular abnormalities. To address this, we propose CytoSAE, the first sparse autoencoder (SAE) tailored to peripheral blood single-cell images. Integrated with Transformer-based feature extraction, CytoSAE learns interpretable, sparse morphological concept representations from over 40,000 images. Our method enables patient- and disease-specific concept discovery, cross-dataset generalization (e.g., to bone marrow), and subcellular abnormality localization with visual attribution. In acute myeloid leukemia (AML) subtype classification, CytoSAE achieves near-state-of-the-art performance while yielding clinically validated morphological concepts—including nuclear irregularity and cytoplasmic vacuolization—confirmed by hematopathologists. This work pioneers the application of SAEs to hematopoietic cell image analysis, substantially enhancing the interpretability and clinical trustworthiness of foundation models in hematopathology.

Technology Category

Application Category

📝 Abstract
Sparse autoencoders (SAEs) emerged as a promising tool for mechanistic interpretability of transformer-based foundation models. Very recently, SAEs were also adopted for the visual domain, enabling the discovery of visual concepts and their patch-wise attribution to tokens in the transformer model. While a growing number of foundation models emerged for medical imaging, tools for explaining their inferences are still lacking. In this work, we show the applicability of SAEs for hematology. We propose CytoSAE, a sparse autoencoder which is trained on over 40,000 peripheral blood single-cell images. CytoSAE generalizes to diverse and out-of-domain datasets, including bone marrow cytology, where it identifies morphologically relevant concepts which we validated with medical experts. Furthermore, we demonstrate scenarios in which CytoSAE can generate patient-specific and disease-specific concepts, enabling the detection of pathognomonic cells and localized cellular abnormalities at the patch level. We quantified the effect of concepts on a patient-level AML subtype classification task and show that CytoSAE concepts reach performance comparable to the state-of-the-art, while offering explainability on the sub-cellular level. Source code and model weights are available at https://github.com/dynamical-inference/cytosae.
Problem

Research questions and friction points this paper is trying to address.

Lack of interpretability tools for medical imaging foundation models
Need for identifying morphologically relevant cell concepts in hematology
Detection of patient-specific and disease-specific cellular abnormalities
Innovation

Methods, ideas, or system contributions that make the work stand out.

Sparse autoencoders for hematology image analysis
Generalizes to diverse and out-of-domain datasets
Generates patient-specific disease-specific concepts
🔎 Similar Papers
No similar papers found.