🤖 AI Summary
This work addresses the lack of principled, atomistic out-of-distribution (OOD) evaluation protocols for machine learning models, which hinders reliable assessment of their generalization to atomic properties such as partial charges and multipole moments. To this end, the authors propose a leave-one-cluster-out evaluation scheme based on SOAP descriptor clustering of atomic environments and introduce QT-Net, a rotation-augmented, non-equivariant graph neural network that incorporates quantum topological atom (QTA) properties as inductive bias to predict electron populations and multipole moments for H, C, N, and O atoms. Experiments demonstrate that QT-Net exhibits strong OOD generalization on QM9 molecules, accurately reconstructs molecular dipole moments from predicted atomic multipoles, and significantly enhances performance in downstream molecular property prediction tasks.
📝 Abstract
Atomic properties such as partial charges or multipoles encode chemically meaningful information that can inform downstream molecular property prediction, but their evaluation as machine learning targets has been complicated by the absence of a principled out-of-distribution evaluation protocol at the atomic level. In this work, we propose a held-out evaluation protocol that clusters atomic environments by SOAP descriptors and computes metrics accounting only for cluster labels unseen during training. Following this procedure, we use 5$\times$5 cross-validation and Tukey's HSD to run a statistically rigorous comparison of E(3)-equivariant against non-equivariant, rotationally augmented models for predicting electron populations and multipoles of H, C, N, and O atoms. Building on our results, we introduce the Quantum Topological Neural Network (QT-Net), a rotationally augmented, non-equivariant graph neural network. We show that QT-Net can be used to infer properties of atoms in molecules from QM9 outside our training set, and that these inferred properties can yield improvement when used as input features for downstream molecular property prediction. To further validate the framework, molecular dipole moments computed from QT-Net's per-atom outputs recover the ground-truth values reported in QM9. We release all code and data, including a JAX implementation of QT-Net, to support the broader use of learned QTA properties as inductive biases for atomic-scale molecular machine learning.