🤖 AI Summary
Accurately predicting electronic bandgaps of crystalline materials solely from chemical composition remains challenging, especially given the non-Gaussian, multimodal nature of bandgap distributions and the absence of structural information.
Method: This work proposes a lightweight, highly interpretable hybrid random-variable modeling framework: bandgaps are modeled as nonnegative weighted averages of element-specific contributions—each parameterized by a single, physically meaningful scalar—subject to a truncation constraint. Crucially, the method formalizes bandgap prediction as a mixture-variable regression task, deliberately omitting crystal-structure inputs and relying exclusively on elemental descriptors.
Contribution/Results: On diverse inorganic crystal datasets, the model achieves accuracy competitive with state-of-the-art structure-aware deep learning models, while reducing parameter count by three orders of magnitude. Learned elemental parameters exhibit strong chemical consistency with established physical quantities—including electronegativity and orbital energy levels—demonstrating simultaneous excellence in predictive performance, interpretability, and physical plausibility.
📝 Abstract
In solid-state materials science, substantial efforts have been devoted to the calculation and modeling of the electronic band gap. While a wide range of ab initio methods and machine learning algorithms have been created that can predict this quantity, the development of new computational approaches for studying the band gap remains an active area of research. Here we introduce a simple machine learning model for predicting the band gap using only the chemical composition of the crystalline material. To motivate the form of the model, we first analyze the empirical distribution of the band gap, which sheds new light on its atypical statistics. Specifically, our analysis enables us to frame band gap prediction as a task of modeling a mixed random variable, and we design our model accordingly. Our model formulation incorporates thematic ideas from chemical heuristic models for other material properties in a manner that is suited towards the band gap modeling task. The model has exactly one parameter corresponding to each element, which is fit using data. To predict the band gap for a given material, the model computes a weighted average of the parameters associated with its constituent elements and then takes the maximum of this quantity and zero. The model provides heuristic chemical interpretability by intuitively capturing the associations between the band gap and individual chemical elements.