Pre-training Graph Neural Networks with Structural Fingerprints for Materials Discovery

📅 2025-03-03

📈 Citations: 0

✨ Influential: 0

career value

207K/year

🤖 AI Summary

Graph neural network (GNN) pretraining in materials science is severely bottlenecked by its heavy reliance on costly quantum-mechanical (QM) data, limiting scalability. Method: We propose the first self-supervised pretraining paradigm for GNNs that leverages low-cost, computationally efficient structural fingerprints—including radial distribution functions (RDF), angular distribution functions (ADF), and smooth overlap of atomic positions (SOAP)—as supervisory signals. This approach eliminates dependence on QM-calculated labels and enables scalable, atomistic foundation model construction. We jointly optimize via multi-task regression and feature distillation. Contribution/Results: On diverse downstream materials property prediction tasks, our model matches the performance of QM-calibrated baselines while improving data efficiency by over 3× and reducing pretraining computational cost by two orders of magnitude. This work establishes an efficient, accessible, and scalable GNN pretraining paradigm for large-scale materials discovery.

Technology Category

Application Category

📝 Abstract

In recent years, pre-trained graph neural networks (GNNs) have been developed as general models which can be effectively fine-tuned for various potential downstream tasks in materials science, and have shown significant improvements in accuracy and data efficiency. The most widely used pre-training methods currently involve either supervised training to fit a general force field or self-supervised training by denoising atomic structures equilibrium. Both methods require datasets generated from quantum mechanical calculations, which quickly become intractable when scaling to larger datasets. Here we propose a novel pre-training objective which instead uses cheaply-computed structural fingerprints as targets while maintaining comparable performance across a range of different structural descriptors. Our experiments show this approach can act as a general strategy for pre-training GNNs with application towards large scale foundational models for atomistic data.

Problem

Research questions and friction points this paper is trying to address.

Develop pre-trained GNNs for materials discovery.

Overcome intractable quantum mechanical dataset scaling.

Use structural fingerprints for efficient GNN pre-training.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses structural fingerprints for GNN pre-training

Reduces reliance on quantum mechanical datasets

Maintains performance across structural descriptors

🔎 Similar Papers

Construction and Application of Materials Knowledge Graph in Multidisciplinary Materials Science via Large Language Model