Combining feature-based approaches with graph neural networks and symbolic regression for synergistic performance and interpretability

📅 2025-09-02

📈 Citations: 0

✨ Influential: 0

career value

202K/year

🤖 AI Summary

In materials science, machine learning models often struggle to simultaneously achieve high predictive accuracy and chemical interpretability. To address this, we propose MatterVial—a hybrid framework that fuses multi-source latent representations from pretrained graph neural networks (e.g., MEGNet, ROOST, ORB) with physically meaningful features generated via symbolic regression, and incorporates a surrogate model for interpretable analysis. This design overcomes the inherent interpretability limitations of end-to-end GNNs and establishes, for the first time, systematic synergy between GNN representation capacity and symbolic induction. Evaluated on the Matbench benchmark, MatterVial achieves significant error reduction across multiple tasks, with average accuracy improvements exceeding 40%, matching or surpassing state-of-the-art end-to-end GNNs. The framework introduces a new paradigm for interpretable, AI-driven materials discovery.

Technology Category

Application Category

📝 Abstract

This study introduces MatterVial, an innovative hybrid framework for feature-based machine learning in materials science. MatterVial expands the feature space by integrating latent representations from a diverse suite of pretrained graph neural network (GNN) models including: structure-based (MEGNet), composition-based (ROOST), and equivariant (ORB) graph networks, with computationally efficient, GNN-approximated descriptors and novel features from symbolic regression. Our approach combines the chemical transparency of traditional feature-based models with the predictive power of deep learning architectures. When augmenting the feature-based model MODNet on Matbench tasks, this method yields significant error reductions and elevates its performance to be competitive with, and in several cases superior to, state-of-the-art end-to-end GNNs, with accuracy increases exceeding 40% for multiple tasks. An integrated interpretability module, employing surrogate models and symbolic regression, decodes the latent GNN-derived descriptors into explicit, physically meaningful formulas. This unified framework advances materials informatics by providing a high-performance, transparent tool that aligns with the principles of explainable AI, paving the way for more targeted and autonomous materials discovery.

Problem

Research questions and friction points this paper is trying to address.

Combining feature-based and graph neural networks for materials science

Enhancing predictive performance while maintaining interpretability

Decoding latent descriptors into physically meaningful formulas

Innovation

Methods, ideas, or system contributions that make the work stand out.

Combining graph neural networks with symbolic regression

Integrating latent representations from pretrained GNN models

Employing surrogate models for interpretable AI formulas

🔎 Similar Papers

A short Survey: Exploring knowledge graph-based neural-symbolic system from application perspective

2024-05-06Citations: 3

Genentech

New York City, New York, United States of America / South San Francisco, California, United States of America

AI Research Scientist — Agentic AI for Materials Discovery