Combining feature-based approaches with graph neural networks and symbolic regression for synergistic performance and interpretability

📅 2025-09-02
📈 Citations: 0
Influential: 0
📄 PDF

career value

215K/year
🤖 AI Summary
In materials science, machine learning models often struggle to simultaneously achieve high predictive accuracy and chemical interpretability. To address this, we propose MatterVial—a hybrid framework that fuses multi-source latent representations from pretrained graph neural networks (e.g., MEGNet, ROOST, ORB) with physically meaningful features generated via symbolic regression, and incorporates a surrogate model for interpretable analysis. This design overcomes the inherent interpretability limitations of end-to-end GNNs and establishes, for the first time, systematic synergy between GNN representation capacity and symbolic induction. Evaluated on the Matbench benchmark, MatterVial achieves significant error reduction across multiple tasks, with average accuracy improvements exceeding 40%, matching or surpassing state-of-the-art end-to-end GNNs. The framework introduces a new paradigm for interpretable, AI-driven materials discovery.

Technology Category

Application Category

📝 Abstract
This study introduces MatterVial, an innovative hybrid framework for feature-based machine learning in materials science. MatterVial expands the feature space by integrating latent representations from a diverse suite of pretrained graph neural network (GNN) models including: structure-based (MEGNet), composition-based (ROOST), and equivariant (ORB) graph networks, with computationally efficient, GNN-approximated descriptors and novel features from symbolic regression. Our approach combines the chemical transparency of traditional feature-based models with the predictive power of deep learning architectures. When augmenting the feature-based model MODNet on Matbench tasks, this method yields significant error reductions and elevates its performance to be competitive with, and in several cases superior to, state-of-the-art end-to-end GNNs, with accuracy increases exceeding 40% for multiple tasks. An integrated interpretability module, employing surrogate models and symbolic regression, decodes the latent GNN-derived descriptors into explicit, physically meaningful formulas. This unified framework advances materials informatics by providing a high-performance, transparent tool that aligns with the principles of explainable AI, paving the way for more targeted and autonomous materials discovery.
Problem

Research questions and friction points this paper is trying to address.

Combining feature-based and graph neural networks for materials science
Enhancing predictive performance while maintaining interpretability
Decoding latent descriptors into physically meaningful formulas
Innovation

Methods, ideas, or system contributions that make the work stand out.

Combining graph neural networks with symbolic regression
Integrating latent representations from pretrained GNN models
Employing surrogate models for interpretable AI formulas
R
Rogério Almeida Gouvêa
Laboratory of Applied Materials and Interfaces, Federal University of Rio Grande do Sul, Porto Alegre, RS 91501-970, Brazil
P
Pierre-Paul De Breuck
Institute of Condensed Matter and Nanosciences, Université catholique de Louvain, Louvain-la-Neuve, Belgium
T
Tatiane Pretto
Laboratory of Applied Materials and Interfaces, Federal University of Rio Grande do Sul, Porto Alegre, RS 91501-970, Brazil
Gian-Marco Rignanese
Gian-Marco Rignanese
Directeur de Recherches F.R.S.-FNRS / Professeur UCLouvain
M
Marcos José Leite dos Santos
Laboratory of Applied Materials and Interfaces, Federal University of Rio Grande do Sul, Porto Alegre, RS 91501-970, Brazil