Self-Refining Training for Amortized Density Functional Theory

📅 2025-06-02
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Amortized density functional theory (DFT) models heavily rely on large pre-collected datasets, severely limiting the efficiency of molecular simulations. Method: We propose a self-refining training paradigm that— for the first time—integrates variational upper-bound minimization with synchronous conformational sampling and model training, enabling end-to-end, self-driven data generation and model optimization for DFT solvers. Our approach unifies variational inference, KL-divergence optimization, asynchronous deep learning training, and Boltzmann-distribution-guided molecular conformational sampling to dynamically generate high-information-content samples during training. Results: Experiments demonstrate significant improvements in energy prediction accuracy under identical computational budgets, alongside over 90% reduction in reliance on pre-stored data. The implementation is open-source and supports high-concurrency sampling–training pipelines.

Technology Category

Application Category

📝 Abstract
Density Functional Theory (DFT) allows for predicting all the chemical and physical properties of molecular systems from first principles by finding an approximate solution to the many-body Schr""odinger equation. However, the cost of these predictions becomes infeasible when increasing the scale of the energy evaluations, e.g., when calculating the ground-state energy for simulating molecular dynamics. Recent works have demonstrated that, for substantially large datasets of molecular conformations, Deep Learning-based models can predict the outputs of the classical DFT solvers by amortizing the corresponding optimization problems. In this paper, we propose a novel method that reduces the dependency of amortized DFT solvers on large pre-collected datasets by introducing a self-refining training strategy. Namely, we propose an efficient method that simultaneously trains a deep-learning model to predict the DFT outputs and samples molecular conformations that are used as training data for the model. We derive our method as a minimization of the variational upper bound on the KL-divergence measuring the discrepancy between the generated samples and the target Boltzmann distribution defined by the ground state energy. To demonstrate the utility of the proposed scheme, we perform an extensive empirical study comparing it with the models trained on the pre-collected datasets. Finally, we open-source our implementation of the proposed algorithm, optimized with asynchronous training and sampling stages, which enables simultaneous sampling and training. Code is available at https://github.com/majhas/self-refining-dft.
Problem

Research questions and friction points this paper is trying to address.

Reducing dependency on large datasets for amortized DFT solvers
Simultaneously training and sampling molecular conformations for DFT
Minimizing KL-divergence between generated samples and Boltzmann distribution
Innovation

Methods, ideas, or system contributions that make the work stand out.

Self-refining training reduces dataset dependency
Simultaneous deep-learning training and conformation sampling
Asynchronous optimization for efficient sampling and training