Pharmolix-FM: All-Atom Foundation Models for Molecular Modeling and Generation

📅 2025-03-12
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Current all-atom foundation models exhibit limited generalization in molecular modeling and generation, primarily due to the multimodal nature of atomic data and the absence of systematic design in training and sampling strategies. To address this, we propose PharMolixFM—the first unified foundation model framework for all-atom structure prediction, docking, and generation of both biomacromolecules and small molecules. Our approach innovatively integrates multimodal generative architectures (diffusion, autoregressive, and normalizing flow), introduces a task-prior-guided generalized denoising paradigm, and empirically identifies a scaling law governing reasoning efficiency in molecular generation. On protein–ligand docking, PharMolixFM-Diff achieves 83.9% accuracy (RMSD < 2 Å) given a binding pocket, with substantially accelerated inference. The code and pre-trained models are publicly released.

Technology Category

Application Category

📝 Abstract
Structural biology relies on accurate three-dimensional biomolecular structures to advance our understanding of biological functions, disease mechanisms, and therapeutics. While recent advances in deep learning have enabled the development of all-atom foundation models for molecular modeling and generation, existing approaches face challenges in generalization due to the multi-modal nature of atomic data and the lack of comprehensive analysis of training and sampling strategies. To address these limitations, we propose PharMolixFM, a unified framework for constructing all-atom foundation models based on multi-modal generative techniques. Our framework includes three variants using state-of-the-art multi-modal generative models. By formulating molecular tasks as a generalized denoising process with task-specific priors, PharMolixFM achieves robust performance across various structural biology applications. Experimental results demonstrate that PharMolixFM-Diff achieves competitive prediction accuracy in protein-small-molecule docking (83.9% vs. 90.2% RMSD<2{AA}, given pocket) with significantly improved inference speed. Moreover, we explore the empirical inference scaling law by introducing more sampling repeats or steps. Our code and model are available at https://github.com/PharMolix/OpenBioMed.
Problem

Research questions and friction points this paper is trying to address.

Addresses generalization challenges in all-atom molecular modeling
Improves multi-modal generative techniques for structural biology
Enhances protein-small-molecule docking accuracy and speed
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-modal generative techniques for molecular modeling
Generalized denoising process with task-specific priors
Improved inference speed and prediction accuracy
🔎 Similar Papers
No similar papers found.
Y
Yizhen Luo
PharMolix Inc., Institute for AI Industry Research (AIR), Tsinghua University
J
Jiashuo Wang
PharMolix Inc., Institute for AI Industry Research (AIR), Tsinghua University
S
Siqi Fan
Institute for AI Industry Research (AIR), Tsinghua University, PharMolix Inc.
Zaiqing Nie
Zaiqing Nie
Tsinghua University
NLPData MiningMachine Learning