🤖 AI Summary
This work addresses the fundamental challenge in all-atom molecular simulations of reconciling quantum mechanical accuracy with the scale of biological systems by introducing UBio-Mol, a universal molecular foundation model. UBio-Mol integrates multi-fidelity data, a linearly scalable equivariant Transformer architecture (E2Former-V2), and a three-stage curriculum learning strategy, enhanced by several innovations: equivariant axis-aligned sparsification, explicit modeling of short- and long-range interactions, and force-directed supervised training. The model achieves ab initio accuracy on benchmark tasks including liquid water structure, ion solvation, and peptide folding, and demonstrates unprecedented capability by enabling highly accurate and efficient simulations of out-of-distribution biological systems containing up to 1,500 atoms.
📝 Abstract
All-atom molecular simulation serves as a quintessential ``computational microscope''for understanding the machinery of life, yet it remains fundamentally limited by the trade-off between quantum-mechanical (QM) accuracy and biological scale. We present UBio-MolFM, a universal foundation model framework specifically engineered to bridge this gap. UBio-MolFM introduces three synergistic innovations: (1) UBio-Mol26, a large bio-specific dataset constructed via a multi-fidelity ``Two-Pronged Strategy''that combines systematic bottom-up enumeration with top-down sampling of native protein environments (up to 1,200 atoms); (2) E2Former-V2, a linear-scaling equivariant transformer that integrates Equivariant Axis-Aligned Sparsification (EAAS) and Long-Short Range (LSR) modeling to capture non-local physics with up to ~4x higher inference throughput in our large-system benchmarks; and (3) a Three-Stage Curriculum Learning protocol that transitions from energy initialization to energy-force consistency, with force-focused supervision to mitigate energy offsets. Rigorous benchmarking across microscopic forces and macroscopic observables -- including liquid water structure, ionic solvation, and peptide folding -- demonstrates that UBio-MolFM achieves ab initio-level fidelity on large, out-of-distribution biomolecular systems (up to ~1,500 atoms) and realistic MD observables. By reconciling scalability with quantum precision, UBio-MolFM provides a robust, ready-to-use tool for the next generation of computational biology.