Multi-view biomedical foundation models for molecule-target and property prediction

📅 2024-10-25
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address insufficient accuracy in molecular target and property prediction for novel drug discovery, this work introduces BioMultiView—the first biomedical multimodal foundation model—integrating molecular graphs, 2D structural images, and textual representations (SMILES/BERT). We pretrain unimodal encoders on a dataset of 200 million molecules and employ dynamic weighted alignment to achieve complementary multimodal feature fusion. Innovatively scaling multiview modeling to the GPCR superfamily (>100 targets), we conduct the first systematic screening of 33 Alzheimer’s disease–associated GPCR targets and identify high-affinity binders. BioMultiView achieves state-of-the-art performance across 18 molecular property and target-binding prediction tasks. Structural modeling validates multiple high-confidence binders and reveals critical binding motifs, substantially enhancing early-stage drug screening efficiency and interpretability.

Technology Category

Application Category

📝 Abstract
Foundation models applied to bio-molecular space hold promise to accelerate drug discovery. Molecular representation is key to building such models. Previous works have typically focused on a single representation or view of the molecules. Here, we develop a multi-view foundation model approach, that integrates molecular views of graph, image and text. Single-view foundation models are each pre-trained on a dataset of up to 200M molecules and then aggregated into combined representations. Our multi-view model is validated on a diverse set of 18 tasks, encompassing ligand-protein binding, molecular solubility, metabolism and toxicity. We show that the multi-view models perform robustly and are able to balance the strengths and weaknesses of specific views. We then apply this model to screen compounds against a large (>100 targets) set of G Protein-Coupled receptors (GPCRs). From this library of targets, we identify 33 that are related to Alzheimer's disease. On this subset, we employ our model to identify strong binders, which are validated through structure-based modeling and identification of key binding motifs.
Problem

Research questions and friction points this paper is trying to address.

Multimodal Biomedical Modeling
Drug Discovery Acceleration
Alzheimer's Disease Treatment
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-representation Learning
Drug Property Prediction
GPCR Binding Affinity
🔎 Similar Papers
No similar papers found.
P
Parthasarathy Suryanarayanan
IBM TJ Watson Research Center, 1101 Kitchawan Rd, Yorktown Heights, NY, 10598, USA.
Y
Yunguang Qiu
Cleveland Clinic Genome Center, Lerner Research Institute, Cleveland Clinic, Cleveland, OH, 44195, USA.; Genomic Medicine Institute, Lerner Research Institute, Cleveland Clinic, Cleveland, OH, 44195, USA.
S
Shreyans Sethi
IBM Research - Almaden, 650 Harry Rd, San Jose, CA, 95120, USA.
Diwakar Mahajan
Diwakar Mahajan
Applied Scientist, IBM Research / MIT-IBM Watson AI Lab
Natural Language ProcessingMachine LearningClinical NLP
H
Hongyang Li
IBM TJ Watson Research Center, 1101 Kitchawan Rd, Yorktown Heights, NY, 10598, USA.
Y
Yuxin Yang
Cleveland Clinic Genome Center, Lerner Research Institute, Cleveland Clinic, Cleveland, OH, 44195, USA.
E
Elif Eyigoz
IBM TJ Watson Research Center, 1101 Kitchawan Rd, Yorktown Heights, NY, 10598, USA.
A
A. G. Saenz
IBM TJ Watson Research Center, 1101 Kitchawan Rd, Yorktown Heights, NY, 10598, USA.
D
Daniel E. Platt
IBM TJ Watson Research Center, 1101 Kitchawan Rd, Yorktown Heights, NY, 10598, USA.
T
Timothy H. Rumbell
IBM TJ Watson Research Center, 1101 Kitchawan Rd, Yorktown Heights, NY, 10598, USA.
Kenney Ng
Kenney Ng
IBM Research, 75 Binney St, Cambridge, MA, 20142, USA.
Sanjoy Dey
Sanjoy Dey
Research Scientist, Center for Computational Health, IBM T. J. Watson research Center
Artificial IntelligenceMachine LearningData MiningHealth InformaticsComputational biology.
M
Myson Burch
IBM TJ Watson Research Center, 1101 Kitchawan Rd, Yorktown Heights, NY, 10598, USA.
Bum Chul Kwon
Bum Chul Kwon
IBM Research
Information VisualizationVisual AnalyticsHuman Computer InteractionMachine LearningHealth Informatics
Pablo Meyer
Pablo Meyer
Research Staff Member, IBM
AIOlfactionsystems biologymetabolismcircadian rhythms
Feixiong Cheng
Feixiong Cheng
Cleveland Clinic
Alzheimer’s DiseaseCardio-OncologyDrug RepurposingNetwork MedicineSystems Pharmacology
J
Jianying Hu
IBM TJ Watson Research Center, 1101 Kitchawan Rd, Yorktown Heights, NY, 10598, USA.
J
Joseph A. Morrone
IBM TJ Watson Research Center, 1101 Kitchawan Rd, Yorktown Heights, NY, 10598, USA.