🤖 AI Summary
Hydrogen atom transfer (HAT) plays a critical role in biological processes such as protein radical migration, yet its quantum-mechanical description remains intractable for high-accuracy simulation at biologically relevant scales. To overcome the dual limitations of traditional force fields and density functional theory (DFT) molecular dynamics—namely, insufficient accuracy and prohibitive computational cost—we develop a high-fidelity machine-learned force field (ML-FF) tailored for peptide systems. High-quality configuration data are generated via semi-empirical pre-screening followed by DFT refinement. We systematically benchmark three graph neural network architectures—SchNet, Allegro, and MACE—and integrate active learning with transition-state search strategies. The MACE model achieves superior out-of-distribution generalization, delivering state-of-the-art accuracy in energy, force, and HAT barrier predictions; its mean absolute error on DFT-computed barriers is merely 1.13 kcal/mol. This enables large-scale, quantum-level reaction dynamics simulations of collagen—a previously inaccessible biomolecular system.
📝 Abstract
Hydrogen atom transfer (HAT) reactions are essential in many biological processes, such as radical migration in damaged proteins, but their mechanistic pathways remain incompletely understood. Simulating HAT is challenging due to the need for quantum chemical accuracy at biologically relevant scales; thus, neither classical force fields nor DFT-based molecular dynamics are applicable. Machine-learned potentials offer an alternative, able to learn potential energy surfaces (PESs) with near-quantum accuracy. However, training these models to generalize across diverse HAT configurations, especially at radical positions in proteins, requires tailored data generation and careful model selection. Here, we systematically generate HAT configurations in peptides to build large datasets using semiempirical methods and DFT. We benchmark three graph neural network architectures (SchNet, Allegro, and MACE) on their ability to learn HAT PESs and indirectly predict reaction barriers from energy predictions. MACE consistently outperforms the others in energy, force, and barrier prediction, achieving a mean absolute error of 1.13 kcal/mol on out-of-distribution DFT barrier predictions. This accuracy enables integration of ML potentials into large-scale collagen simulations to compute reaction rates from predicted barriers, advancing mechanistic understanding of HAT and radical migration in peptides. We analyze scaling laws, model transferability, and cost-performance trade-offs, and outline strategies for improvement by combining ML potentials with transition state search algorithms and active learning. Our approach is generalizable to other biomolecular systems, enabling quantum-accurate simulations of chemical reactivity in complex environments.