M3ST-DTI: A multi-task learning model for drug-target interactions based on multi-modal features and multi-stage alignment

📅 2025-10-14

📈 Citations: 0

✨ Influential: 0

career value

166K/year

🤖 AI Summary

Existing drug–target interaction (DTI) prediction methods struggle to model deep intra-modal feature interactions and suffer from insufficient cross-modal alignment, limiting both predictive performance and generalizability. To address these challenges, we propose M3ST-DTI—a multimodal, multi-stage alignment framework. First, it introduces a two-phase alignment strategy: early-stage multi-scale contrastive alignment (MCA) with Gram-matrix structural regularization, followed by late-stage fine-grained bidirectional cross-attention (BCA). Second, it incorporates a deep orthogonal fusion module to suppress modality redundancy. Third, it integrates self-attention, hybrid pooling, and graph attention mechanisms to enable efficient representation learning and alignment of heterogeneous features—textual, structural, and functional. Extensive experiments on multiple benchmark datasets demonstrate that M3ST-DTI consistently outperforms state-of-the-art methods, achieving significant gains in prediction accuracy, robustness, and cross-dataset generalization.

Technology Category

Application Category

📝 Abstract

Accurate prediction of drug-target interactions (DTI) is pivotal in drug discovery. However, existing approaches often fail to capture deep intra-modal feature interactions or achieve effective cross-modal alignment, limiting predictive performance and generalization. To address these challenges, we propose M3ST-DTI, a multi-task learning model that enables multi-stage integration and alignment of multi modal features for DTI prediction. M3ST-DTI incorporates three types of features-textual, structural, and functional and enhances intra-modal representations using self-attention mechanisms and a hybrid pooling graph attention module. For early-stage feature alignment and fusion, the model in tegrates MCA with Gram loss as a structural constraint. In the later stage, a BCA module captures fine-grained interactions between drugs and targets within each modality, while a deep orthogonal fusion module mitigates feature redundancy.Extensive evaluations on benchmark datasets demonstrate that M3ST-DTI consistently outperforms state-of-the art methods across diverse metrics

Problem

Research questions and friction points this paper is trying to address.

Predicting drug-target interactions using multi-modal features and alignment

Improving intra-modal feature interactions with self-attention mechanisms

Enhancing cross-modal alignment through multi-stage fusion techniques

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-task learning with multi-modal feature integration

Self-attention and graph attention for intra-modal enhancement

Multi-stage alignment using cross-attention and orthogonal fusion

🔎 Similar Papers

Alifuse: Aligning and Fusing Multimodal Medical Data for Computer-Aided Diagnosis