Genetically Aligned Patient Representations Improve Hematological Diagnosis

📅 2026-05-28
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the challenge of improving diagnostic accuracy in hematologic disorders by integrating peripheral blood cell images with genetic data. The authors propose a two-stage multimodal alignment framework: first, a self-supervised iBOT-based Transformer learns representations from white blood cell images; second, a supervised contrastive loss aligns these image features with chromosomal abnormalities and targeted gene mutations. This approach achieves, for the first time, an end-to-end integration of single-cell morphological data with cellular and molecular genetic information. The resulting patient-level multimodal representations conform to clinical workflows and enable interpretable cross-modal retrieval between disease phenotypes and genetic variants. Evaluated on acute myeloid leukemia diagnosis, the method significantly outperforms existing pathology foundation models.
📝 Abstract
Multimodal alignment of histopathology encoders with transcriptomic and genomic data has been shown to significantly improve performance in downstream diagnostic tasks. Hematological cytology is unique in that visual single-cell evaluation is often paired with cytogenetics and molecular genetics for blood cancer diagnosis. In this study, we present a framework to align single white blood cell images with chromosomal aberrations (karyotype) and somatic mutations from targeted gene panels. Our training strategy follows a two-stage approach: (i) self-supervised, vision-only pretraining of a transformer aggregator using an iBOT head on a cohort of over 1500 patients, and (ii) genetic alignment via supervised contrastive loss on acute myeloid leukemia patients. Our genetically aligned patient encoder improves hematological diagnostic tasks, outperforming slide-level histopathology foundation models. Additionally, the model provides off-the-shelf retrieval capabilities for diseases and genetic alterations. Incorporating genetic data into patient encoders increases the quality of patient representations, providing a framework that aligns with clinical diagnostic workflows and paves the way for future multimodal hematology-specific AI. The code and model weights are available at https://github.com/marrlab/GenBloom.
Problem

Research questions and friction points this paper is trying to address.

hematological diagnosis
multimodal alignment
genetic data integration
patient representation
blood cancer
Innovation

Methods, ideas, or system contributions that make the work stand out.

multimodal alignment
genetically aligned representations
self-supervised pretraining
supervised contrastive learning
hematological diagnosis
Muhammed Furkan Dasdelen
Muhammed Furkan Dasdelen
Institute of AI for Health, Helmholtz Zentrum München
machine learning
F
Fatih Ozlugedik
Institute of AI for Health, Helmholtz Munich, Germany
I
Ilaria Looser
Institute of AI for Health, Helmholtz Munich, Germany
R
Rao Muhammad Umer
Institute of AI for Health, Helmholtz Munich, Germany
C
Christian Pohlkamp
Munich Leukemia Laboratory, Germany
Carsten Marr
Carsten Marr
Institute of AI for Health @ Helmholtz Munich & Clinics @ LMU München
AI for Biomed & Health