🤖 AI Summary
In materials science, machine learning models often struggle to simultaneously achieve high predictive accuracy and chemical interpretability. To address this, we propose MatterVial—a hybrid framework that fuses multi-source latent representations from pretrained graph neural networks (e.g., MEGNet, ROOST, ORB) with physically meaningful features generated via symbolic regression, and incorporates a surrogate model for interpretable analysis. This design overcomes the inherent interpretability limitations of end-to-end GNNs and establishes, for the first time, systematic synergy between GNN representation capacity and symbolic induction. Evaluated on the Matbench benchmark, MatterVial achieves significant error reduction across multiple tasks, with average accuracy improvements exceeding 40%, matching or surpassing state-of-the-art end-to-end GNNs. The framework introduces a new paradigm for interpretable, AI-driven materials discovery.
📝 Abstract
This study introduces MatterVial, an innovative hybrid framework for feature-based machine learning in materials science. MatterVial expands the feature space by integrating latent representations from a diverse suite of pretrained graph neural network (GNN) models including: structure-based (MEGNet), composition-based (ROOST), and equivariant (ORB) graph networks, with computationally efficient, GNN-approximated descriptors and novel features from symbolic regression. Our approach combines the chemical transparency of traditional feature-based models with the predictive power of deep learning architectures. When augmenting the feature-based model MODNet on Matbench tasks, this method yields significant error reductions and elevates its performance to be competitive with, and in several cases superior to, state-of-the-art end-to-end GNNs, with accuracy increases exceeding 40% for multiple tasks. An integrated interpretability module, employing surrogate models and symbolic regression, decodes the latent GNN-derived descriptors into explicit, physically meaningful formulas. This unified framework advances materials informatics by providing a high-performance, transparent tool that aligns with the principles of explainable AI, paving the way for more targeted and autonomous materials discovery.