Uni-Mol3: A Multi-Molecular Foundation Model for Advancing Organic Reaction Modeling

📅 2025-07-29
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Modeling organic reactions faces challenges in capturing complex multi-molecule relational dependencies and achieving deep mechanistic understanding. Method: We propose the first foundational pre-trained model specifically designed for multi-molecule systems. Our approach introduces a hierarchical learning framework coupled with a multi-scale 3D molecular tokenizer, establishing the first geometry-aware molecular language system. We further design a two-stage pre-training paradigm—operating at both molecular and reaction levels—and incorporate prompt-aware fine-tuning and cross-task transfer strategies. Contribution/Results: Evaluated on ten benchmark datasets spanning four reaction prediction tasks—reaction classification, condition prediction, retrosynthesis, and more—the model consistently outperforms state-of-the-art methods, delivering significant improvements in prediction accuracy and generalization. These results demonstrate its capacity to effectively capture intricate organic reaction mechanisms and intermolecular interactions.

Technology Category

Application Category

📝 Abstract
Organic reaction, the foundation of modern chemical industry, is crucial for new material development and drug discovery. However, deciphering reaction mechanisms and modeling multi-molecular relationships remain formidable challenges due to the complexity of molecular dynamics. While several state-of-the-art models like Uni-Mol2 have revolutionized single-molecular representation learning, their extension to multi-molecular systems, where chemical reactions inherently occur, has been underexplored. This paper introduces Uni-Mol3, a novel deep learning framework that employs a hierarchical pipeline for multi-molecular reaction modeling. At its core, Uni-Mol3 adopts a multi-scale molecular tokenizer (Mol-Tokenizer) that encodes 3D structures of molecules and other features into discrete tokens, creating a 3D-aware molecular language. The framework innovatively combines two pre-training stages: molecular pre-training to learn the molecular grammars and reaction pre-training to capture fundamental reaction principles, forming a progressive learning paradigm from single- to multi-molecular systems. With prompt-aware downstream fine-tuning, Uni-Mol3 demonstrates exceptional performance in diverse organic reaction tasks and supports multi-task prediction with strong generalizability. Experimental results across 10 datasets spanning 4 downstream tasks show that Uni-Mol3 outperforms existing methods, validating its effectiveness in modeling complex organic reactions. This work not only ushers in an alternative paradigm for multi-molecular computational modeling but also charts a course for intelligent organic reaction by bridging molecular representation with reaction mechanism understanding.
Problem

Research questions and friction points this paper is trying to address.

Modeling complex multi-molecular organic reaction mechanisms
Extending single-molecular representation to multi-molecular systems
Bridging 3D molecular structures with reaction principles
Innovation

Methods, ideas, or system contributions that make the work stand out.

Hierarchical pipeline for multi-molecular reaction modeling
Multi-scale molecular tokenizer encoding 3D structures
Two-stage pre-training for molecular and reaction principles
🔎 Similar Papers
No similar papers found.