🤖 AI Summary
Accurate diagnosis of hematologic disorders is hindered by the lack of generalizable genetic signatures across hematopoietic lineages.
Method: We propose a novel autoencoder-based foundation model anchored on multipotent progenitor cells, enabling progenitor-driven zero-shot cross-lineage classification (e.g., monocytes, lymphocytes) without lineage-specific training data. The model jointly encodes multilineage differentiation trajectories in a unified latent space—overcoming limitations of single-cell-type isolation—by integrating fully connected networks, Transformers, and graph convolutional networks (GCNs) to capture hierarchical, structural, and sequential biological relationships.
Contribution/Results: Our approach achieves >95% accuracy in multi-class disease classification and zero-shot binary classification F1-scores >0.7. It significantly enhances transferability of progenitor cell representations to downstream differentiated cells, establishing a new paradigm for mechanistic insight into hematologic diseases and precision diagnostics.
📝 Abstract
We present a foundation modeling framework that leverages deep learning to uncover latent genetic signatures across the hematopoietic hierarchy. Our approach trains a fully connected autoencoder on multipotent progenitor cells, reducing over 20,000 gene features to a 256-dimensional latent space that captures predictive information for both progenitor and downstream differentiated cells such as monocytes and lymphocytes. We validate the quality of these embeddings by training feed-forward, transformer, and graph convolutional architectures for blood disease diagnosis tasks. We also explore zero-shot prediction using a progenitor disease state classification model to classify downstream cell conditions. Our models achieve greater than 95% accuracy for multi-class classification, and in the zero-shot setting, we achieve greater than 0.7 F1-score on the binary classification task. Future work should improve embeddings further to increase robustness on lymphocyte classification specifically.