MAMMAL - Molecular Aligned Multi-Modal Architecture and Language

🤖 AI Summary

Existing biomolecular foundation models are predominantly unimodal—focused solely on proteins or small molecules—thus failing to capture cross-modal biological interactions, which hinders mechanistic understanding of diseases and drug discovery. To address this, we introduce BioM3, the first unified multimodal foundation model jointly encoding proteins, small molecules, and multi-omics data. BioM3 innovatively integrates a molecular alignment architecture with a structured prompt grammar to support classification, regression, and generative tasks. Its architecture employs a Transformer backbone, cross-modal embedding alignment, and a scalar-token hybrid input-output design, while incorporating AlphaFold 3–inspired binding-site prediction capabilities. Evaluated on 11 downstream tasks, BioM3 achieves state-of-the-art performance on 9; notably, it outperforms prior methods on 3 of 4 critical tasks—including antibody–antigen binding affinity prediction. The code and pretrained weights are publicly released.

📝 Abstract

Large language models applied to vast biological datasets have the potential to transform biology by uncovering disease mechanisms and accelerating drug development. However, current models are often siloed, trained separately on small-molecules, proteins, or transcriptomic data, limiting their ability to capture complex, multi-modal interactions. Effective drug discovery requires computational tools that integrate multiple biological entities while supporting prediction and generation, a challenge existing models struggle to address. For this purpose, we present MAMMAL - Molecular Aligned Multi-Modal Architecture and Language - a versatile method applied to create a multi-task foundation model that learns from large-scale biological datasets across diverse modalities, including proteins, small-molecules, and omics. MAMMAL's structured prompt syntax supports classification, regression, and generation tasks while handling token and scalar inputs and outputs. Evaluated on eleven diverse downstream tasks, it reaches a new state of the art (SOTA) in nine tasks and is comparable to SOTA in two tasks, all within a unified architecture, unlike prior task-specific models. Additionally, we explored Alphafold 3 binding prediction capabilities on antibody-antigen and nanobody-antigen complexes showing significantly better classification performance of MAMMAL in 3 out of 4 targets. The model code and pretrained weights are publicly available at https://github.com/BiomedSciAI/biomed-multi-alignment and https://huggingface.co/ibm/biomed.omics.bl.sm.ma-ted-458m

🔎 Similar Papers

Leveraging Biomolecule and Natural Language through Multi-Modal Learning: A Survey

2024-03-03arXiv.orgCitations: 22

3D-MolT5: Towards Unified 3D Molecule-Text Modeling with 3D Molecular Tokenization

2024-06-09arXiv.orgCitations: 1

MAMMAL - Molecular Aligned Multi-Modal Architecture and Language

career value

Technology Category

Application Category