MolCLIP: A Molecular-Auxiliary CLIP Framework for Identifying Drug Mechanism of Action Based on Time-Lapsed Mitochondrial Images

๐Ÿ“… 2025-07-10
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
Existing MoA identification methods predominantly rely on static cellular images, neglecting dynamic temporal responses of live cells and failing to incorporate drug molecular structural information. To address these limitations, we propose a molecule-augmented CLIP framework that, for the first time, integrates drug molecular representations into a vision-language model to guide spatiotemporal representation learning on mitochondrial time-lapse videos. By leveraging cross-modal alignment and metric learning, our method optimally fuses molecular and video features, jointly modeling drug chemical structure and cellular dynamic phenotypes. Evaluated on MitoDataset, our approach achieves +20.5% mAP for MoA classification and +51.2% mAP for drug identification over state-of-the-art methods. Key contributions include: (1) establishing the first moleculeโ€“cell-video bimodal temporal learning paradigm tailored for MoA identification; and (2) designing a molecule-guided CLIP architecture with a dynamic feature aggregation strategy.

Technology Category

Application Category

๐Ÿ“ Abstract
Drug Mechanism of Action (MoA) mainly investigates how drug molecules interact with cells, which is crucial for drug discovery and clinical application. Recently, deep learning models have been used to recognize MoA by relying on high-content and fluorescence images of cells exposed to various drugs. However, these methods focus on spatial characteristics while overlooking the temporal dynamics of live cells. Time-lapse imaging is more suitable for observing the cell response to drugs. Additionally, drug molecules can trigger cellular dynamic variations related to specific MoA. This indicates that the drug molecule modality may complement the image counterpart. This paper proposes MolCLIP, the first visual language model to combine microscopic cell video- and molecule-modalities. MolCLIP designs a molecule-auxiliary CLIP framework to guide video features in learning the distribution of the molecular latent space. Furthermore, we integrate a metric learning strategy with MolCLIP to optimize the aggregation of video features. Experimental results on the MitoDataset demonstrate that MolCLIP achieves improvements of 51.2% and 20.5% in mAP for drug identification and MoA recognition, respectively.
Problem

Research questions and friction points this paper is trying to address.

Identifying drug mechanism via temporal cell imaging dynamics
Integrating molecular data with cell video for MoA analysis
Improving drug and MoA recognition accuracy with multimodal learning
Innovation

Methods, ideas, or system contributions that make the work stand out.

Combines microscopic cell video and molecule modalities
Uses molecule-auxiliary CLIP framework
Integrates metric learning for feature optimization
๐Ÿ”Ž Similar Papers
No similar papers found.
Fengqian Pang
Fengqian Pang
North China University of Technology
medical image processingdeep learning
C
Chunyue Lei
North China University of Technology
H
Hongfei Zhao
Beijing Neusoft Medical Equipment CO., Ltd
C
Chenghao Liu
Beijing Institute of Technology
Z
Zhiqiang Xing
North China University of Technology
H
Huafeng Wang
North China University of Technology
Chuyang Ye
Chuyang Ye
Associate Professor, School of Integrated Circuits and Electronics, Beijing Institute of Technology
Medical Image Analysis