Learning from the electronic structure of molecules across the periodic table

📅 2025-09-30
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the limited generalizability of machine-learned interatomic potentials (MLIPs) under few-shot learning conditions, this work introduces Hamiltonian pretraining—a novel paradigm that leverages molecular electronic Hamiltonian matrices (encoding orbital interactions) as supervisory signals for learning universal, transferable atomic environment representations. We implement this via the deep neural network HELM, pretrained on the OMol_CSH_58k dataset using the def2-TZVPD basis set, then fine-tuned for potential energy surface fitting. The resulting model supports 58 elements and systems with over 100 atoms, achieving significantly improved energy prediction accuracy under low-data regimes. Our approach empirically validates that Hamiltonian matrices encode robust physical priors—spanning diverse elements and system sizes—thereby establishing an efficient, scalable pathway for few-shot MLIP development.

Technology Category

Application Category

📝 Abstract
Machine-Learned Interatomic Potentials (MLIPs) require vast amounts of atomic structure data to learn forces and energies, and their performance continues to improve with training set size. Meanwhile, the even greater quantities of accompanying data in the Hamiltonian matrix H behind these datasets has so far gone unused for this purpose. Here, we provide a recipe for integrating the orbital interaction data within H towards training pipelines for atomic-level properties. We first introduce HELM ("Hamiltonian-trained Electronic-structure Learning for Molecules"), a state-of-the-art Hamiltonian prediction model which bridges the gap between Hamiltonian prediction and universal MLIPs by scaling to H of structures with 100+ atoms, high elemental diversity, and large basis sets including diffuse functions. To accompany HELM, we release a curated Hamiltonian matrix dataset, 'OMol_CSH_58k', with unprecedented elemental diversity (58 elements), molecular size (up to 150 atoms), and basis set (def2-TZVPD). Finally, we introduce 'Hamiltonian pretraining' as a method to extract meaningful descriptors of atomic environments even from a limited number atomic structures, and repurpose this shared embedding space to improve performance on energy-prediction in low-data regimes. Our results highlight the use of electronic interactions as a rich and transferable data source for representing chemical space.
Problem

Research questions and friction points this paper is trying to address.

Integrating Hamiltonian matrix data into machine learning interatomic potential training
Developing scalable Hamiltonian prediction for diverse large molecular systems
Using Hamiltonian pretraining to improve energy prediction with limited data
Innovation

Methods, ideas, or system contributions that make the work stand out.

Using Hamiltonian matrix data for atomic property training
Scaling Hamiltonian prediction to diverse 100+ atom structures
Applying Hamiltonian pretraining to improve low-data energy prediction
🔎 Similar Papers
No similar papers found.