π€ AI Summary
Modeling defect interactions in two-dimensional materials relies heavily on black-box neural networks, suffering from poor physical interpretability and limited extrapolation capability. Method: This work introduces, for the first time, the Symbolic Expression Generative Variational Autoencoder (SEGVAE)βa deep generative symbolic regression framework integrating variational autoencoding, graph-structured encoding, and optimized symbolic expression samplingβto automatically discover concise, interpretable, analytic physical equations directly from data. Contribution/Results: The derived closed-form expressions for defect energy versus atomic configuration achieve prediction accuracy competitive with state-of-the-art graph neural networks, while simultaneously offering explicit physical meaning, strong generalization across unseen defect types and configurations, and robust extrapolation beyond training domains. This establishes a new paradigm for defect modeling in 2D materials that unifies high predictive performance with intrinsic interpretability and physical consistency.
π Abstract
Machine learning models have become firmly established across all scientific fields. Extracting features from data and making inferences based on them with neural network models often yields high accuracy; however, this approach has several drawbacks. Symbolic regression is a powerful technique for discovering analytical equations that describe data, providing interpretable and generalizable models capable of predicting unseen data. Symbolic regression methods have gained new momentum with the advancement of neural network technologies and offer several advantages, the main one being the interpretability of results. In this work, we examined the application of the deep symbolic regression algorithm SEGVAE to determine the properties of two-dimensional materials with defects. Comparing the results with state-of-the-art graph neural network-based methods shows comparable or, in some cases, even identical outcomes. We also discuss the applicability of this class of methods in natural sciences.