Self-Refining Training for Amortized Density Functional Theory

📅 2025-06-02

📈 Citations: 0

✨ Influential: 0

career value

201K/year

🤖 AI Summary

Amortized density functional theory (DFT) models heavily rely on large pre-collected datasets, severely limiting the efficiency of molecular simulations. Method: We propose a self-refining training paradigm that— for the first time—integrates variational upper-bound minimization with synchronous conformational sampling and model training, enabling end-to-end, self-driven data generation and model optimization for DFT solvers. Our approach unifies variational inference, KL-divergence optimization, asynchronous deep learning training, and Boltzmann-distribution-guided molecular conformational sampling to dynamically generate high-information-content samples during training. Results: Experiments demonstrate significant improvements in energy prediction accuracy under identical computational budgets, alongside over 90% reduction in reliance on pre-stored data. The implementation is open-source and supports high-concurrency sampling–training pipelines.

Technology Category

Application Category

📝 Abstract

Density Functional Theory (DFT) allows for predicting all the chemical and physical properties of molecular systems from first principles by finding an approximate solution to the many-body Schr""odinger equation. However, the cost of these predictions becomes infeasible when increasing the scale of the energy evaluations, e.g., when calculating the ground-state energy for simulating molecular dynamics. Recent works have demonstrated that, for substantially large datasets of molecular conformations, Deep Learning-based models can predict the outputs of the classical DFT solvers by amortizing the corresponding optimization problems. In this paper, we propose a novel method that reduces the dependency of amortized DFT solvers on large pre-collected datasets by introducing a self-refining training strategy. Namely, we propose an efficient method that simultaneously trains a deep-learning model to predict the DFT outputs and samples molecular conformations that are used as training data for the model. We derive our method as a minimization of the variational upper bound on the KL-divergence measuring the discrepancy between the generated samples and the target Boltzmann distribution defined by the ground state energy. To demonstrate the utility of the proposed scheme, we perform an extensive empirical study comparing it with the models trained on the pre-collected datasets. Finally, we open-source our implementation of the proposed algorithm, optimized with asynchronous training and sampling stages, which enables simultaneous sampling and training. Code is available at https://github.com/majhas/self-refining-dft.

Problem

Research questions and friction points this paper is trying to address.

Reducing dependency on large datasets for amortized DFT solvers

Simultaneously training and sampling molecular conformations for DFT

Minimizing KL-divergence between generated samples and Boltzmann distribution

Innovation

Methods, ideas, or system contributions that make the work stand out.

Self-refining training reduces dataset dependency

Simultaneous deep-learning training and conformation sampling

Asynchronous optimization for efficient sampling and training

🔎 Similar Papers

Regularized Adaptive Momentum Dual Averaging with an Efficient Inexact Subproblem Solver for Training Structured Neural Network