🤖 AI Summary
Microscopy image analysis exhibits poor generalization in high-throughput perturbation screening across de novo cell lines, primarily due to morphological and biological heterogeneity that hinders model transferability. To address this, we propose a knowledge-guided disentangled representation learning framework: (1) a biologically grounded knowledge graph—constructed from protein–protein interaction networks—is leveraged for pretraining to encode perturbation-specific representations; (2) single-cell transcriptomic features are integrated to model cell-line-specific representations, enforcing orthogonality between the two factors via disentanglement. Evaluated on the RxRx dataset, our method significantly improves few-shot and single-shot fine-tuning performance and enhances robustness of image representations for unseen cell lines. The core innovation lies in embedding structured biological priors—namely, domain-specific knowledge graphs and single-cell multi-omics data—into deep learning architectures, enabling, for the first time, interpretable disentanglement of perturbation effects from cellular context. This establishes a generalizable, interpretable phenotypic analysis paradigm for real-world drug screening.
📝 Abstract
High-throughput screening techniques, such as microscopy imaging of cellular responses to genetic and chemical perturbations, play a crucial role in drug discovery and biomedical research. However, robust perturbation screening for extit{de novo} cell lines remains challenging due to the significant morphological and biological heterogeneity across cell lines. To address this, we propose a novel framework that integrates external biological knowledge into existing pretraining strategies to enhance microscopy image profiling models. Our approach explicitly disentangles perturbation-specific and cell line-specific representations using external biological information. Specifically, we construct a knowledge graph leveraging protein interaction data from STRING and Hetionet databases to guide models toward perturbation-specific features during pretraining. Additionally, we incorporate transcriptomic features from single-cell foundation models to capture cell line-specific representations. By learning these disentangled features, our method improves the generalization of imaging models to extit{de novo} cell lines. We evaluate our framework on the RxRx database through one-shot fine-tuning on an RxRx1 cell line and few-shot fine-tuning on cell lines from the RxRx19a dataset. Experimental results demonstrate that our method enhances microscopy image profiling for extit{de novo} cell lines, highlighting its effectiveness in real-world phenotype-based drug discovery applications.