NaFM: Pre-training a Foundation Model for Small-Molecule Natural Products

📅 2025-03-22
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing molecular pretraining models, designed primarily for synthetic compound design, inadequately capture the structural diversity, biosynthetic pathways, and cross-level evolutionary patterns characteristic of natural products (NPs), resulting in poor generalization and suboptimal performance on downstream tasks. To address this, we propose NP-Foundation—the first foundation model specifically tailored for small-molecule NPs. It innovatively integrates skeletal evolutionary priors with side-chain semantic representations and establishes a dual-path pretraining paradigm synergizing contrastive learning and masked graph modeling. Crucially, it is the first to jointly model NP evolution across classification, biosynthetic gene cluster, and microbial host levels. Evaluated on NP classification and target virtual screening, NP-Foundation achieves state-of-the-art performance, significantly outperforming general-purpose molecular models—demonstrating its capacity to deeply encode biosynthetic logic and phylogenetic relationships.

Technology Category

Application Category

📝 Abstract
Natural products, as metabolites from microorganisms, animals, or plants, exhibit diverse biological activities, making them crucial for drug discovery. Nowadays, existing deep learning methods for natural products research primarily rely on supervised learning approaches designed for specific downstream tasks. However, such one-model-for-a-task paradigm often lacks generalizability and leaves significant room for performance improvement. Additionally, existing molecular characterization methods are not well-suited for the unique tasks associated with natural products. To address these limitations, we have pre-trained a foundation model for natural products based on their unique properties. Our approach employs a novel pretraining strategy that is especially tailored to natural products. By incorporating contrastive learning and masked graph learning objectives, we emphasize evolutional information from molecular scaffolds while capturing side-chain information. Our framework achieves state-of-the-art (SOTA) results in various downstream tasks related to natural product mining and drug discovery. We first compare taxonomy classification with synthesized molecule-focused baselines to demonstrate that current models are inadequate for understanding natural synthesis. Furthermore, by diving into a fine-grained analysis at both the gene and microbial levels, NaFM demonstrates the ability to capture evolutionary information. Eventually, our method is experimented with virtual screening, illustrating informative natural product representations that can lead to more effective identification of potential drug candidates.
Problem

Research questions and friction points this paper is trying to address.

Existing deep learning lacks generalizability for natural products.
Current models fail to capture natural synthesis understanding.
Molecular characterization methods are unsuitable for natural products.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Pre-trained foundation model for natural products
Contrastive and masked graph learning objectives
Captures evolutionary and side-chain information
🔎 Similar Papers
No similar papers found.
Yuheng Ding
Yuheng Ding
Peking University
Deep LearningAI4ScienceComputer Science
Yusong Wang
Yusong Wang
Tokyo Institute of Technology
Representation LearningAffective Computing
Bo Qiang
Bo Qiang
Institute of Protein Design, University of Washington
Protein DesignAI4science
J
Jie Yu
State Key Laboratory of Natural and Biomimetic Drugs, School of Pharmaceutical Sciences, Peking University, Beijing, 100191, China.
Q
Qi Li
State Key Laboratory of Natural and Biomimetic Drugs, School of Pharmaceutical Sciences, Peking University, Beijing, 100191, China.
Yiran Zhou
Yiran Zhou
University of Technology Sydney
SLAMRobotic3D Reconstruction
Z
Zhenmin Liu
State Key Laboratory of Natural and Biomimetic Drugs, School of Pharmaceutical Sciences, Peking University, Beijing, 100191, China.