🤖 AI Summary
This work addresses the limited generalization capability of existing methods to unseen drugs in drug–drug interaction (DDI) prediction by proposing a model-agnostic multimodal fusion module. The module maps heterogeneous drug information—including structural, chemical, and semantic features—into a shared latent space and unifies them into a common tokenized representation. It captures cross-modal dependencies through a reusable fusion mechanism without requiring modifications to downstream prediction architectures. As the first decoupled multimodal fusion design independent of the predictive model, the proposed approach consistently enhances performance across diverse DDI models and the DrugBank benchmark, demonstrating particularly strong gains in challenging generalization scenarios such as the both-unseen setting.
📝 Abstract
Drug-drug interaction (DDI) prediction is a critical task in computational biomedicine, as adverse interactions between co-administered drugs can cause severe side effects and clinical risks. A key challenge is unseen-drug generalization, where interactions must be predicted for drugs not observed during training. Although multimodal DDI models exploit diverse drug-related information, their fusion mechanisms are often tied to specific prediction architectures, limiting their reuse across models. To address this, we propose AIM-DDI, an architecture-independent multimodal integration module that represents heterogeneous modality information as tokens in a shared latent space. By modeling dependencies across modality tokens through a unified fusion module, AIM-DDI enables model-agnostic integration of structural, chemical, and semantic drug signals across different DDI prediction architectures. Extensive evaluations across diverse DDI models and DrugBank-based settings show that AIM-DDI consistently improves prediction performance, with the strongest gains under the most challenging both-unseen setting where neither drug in a test pair is observed during training. These results suggest that treating multimodal integration as a reusable module, rather than a model-specific fusion component, is an effective strategy for robust unseen-drug DDI prediction.