Intervention-Aware Multiscale Representation Learning from Imaging Phenomics and Perturbation Transcriptomics

📅 2026-04-19

📈 Citations: 0

✨ Influential: 0

career value

172K/year

🤖 AI Summary

This work addresses the lack of mechanistic interpretability in existing microscopic imaging–based phenotypic profiling and the limited generalizability of cross-modal methods to unseen interventions, which often ignore cell-type specificity and dosage effects. To overcome these limitations, the authors propose an intervention-aware knowledge distillation framework that, for the first time, disentangles and explicitly incorporates intervention semantics, cell-type context, and dose-response relationships into multimodal representation learning. By aligning intervention semantics—rather than sample identities—through a chemistry-aware codebook and leveraging perturbation-induced transcriptomic profiles to guide image representation learning, the method integrates a fine-tuned single-cell foundation model, a transcriptome-conditioned teacher network, and an image-based student network. Evaluated on Cell Painting and RxRx benchmarks, the approach substantially improves one-shot transfer performance to unseen interventions and enhances drug target gene discovery, with theoretical risk bounds provided to support its generalization guarantees.

Technology Category

Application Category

📝 Abstract

Microscopy-based phenotypic profiling is scalable for drug discovery but lacks the mechanistic depth of transcriptomics, which remains costly and scarce. Existing multimodal approaches either use images to support other modalities or naively align representations by sample identity, ignoring cell-type and dose variations in weakly paired data-limiting generalization to unseen interventions. In this paper, we introduce an intervention-aware distillation framework that leverages perturbational transcriptomics to guide image representation learning. A transcriptome-conditioned teacher integrates gene expression and intervention metadata to produce soft distributions over a chemistry-aware codebook organized by drug similarity. The teacher employs a fine-tuned single-cell foundation model to encode cell-type context and disentangle dose effects. An image-only student learns to predict these distributions from microscopy alone, distilling mechanistic knowledge while operating independently at test time. This design emphasizes intervention semantics rather than identity alignment and explicitly handles dose and cell-type mismatches. We provide theoretical guarantees showing that transcriptomic guidance tightens the risk bound for image-based prediction. On Cell Painting and RxRx datasets paired with L1000, our method significantly improves one-shot transfer to unseen interventions and drug-target gene discovery compared to self-supervised and alignment baselines.

Problem

Research questions and friction points this paper is trying to address.

multimodal learning

intervention generalization

phenotypic profiling

perturbation transcriptomics

dose and cell-type mismatch

Innovation

Methods, ideas, or system contributions that make the work stand out.

intervention-aware representation learning

multimodal distillation

perturbation transcriptomics