opXRD: Open Experimental Powder X-ray Diffraction Database

📅 2025-03-07
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Automated analysis of experimental powder X-ray diffraction (pXRD) data is hindered by the scarcity of high-quality labeled datasets; existing models rely predominantly on simulated data and thus exhibit poor generalization to real-world pXRD patterns characterized by high noise and strong background interference. Method: We introduce the first open-source, large-scale experimental pXRD database—comprising 2,179 expert-annotated spectra and extensive unlabeled real measurements—covering diverse inorganic materials. The database features standardized acquisition protocols, crystallographically precise phase annotations, and an openly accessible architecture. Contribution/Results: This resource systematically bridges the performance gap between simulation-based and experimentally grounded models for the first time. It significantly improves phase identification accuracy and robustness on real pXRD data, enables noise-robust evaluation and cross-domain transfer learning, and establishes a foundational data infrastructure for fully automated, self-driving pXRD analysis in modern laboratories.

Technology Category

Application Category

📝 Abstract
Powder X-ray diffraction (pXRD) experiments are a cornerstone for materials structure characterization. Despite their widespread application, analyzing pXRD diffractograms still presents a significant challenge to automation and a bottleneck in high-throughput discovery in self-driving labs. Machine learning promises to resolve this bottleneck by enabling automated powder diffraction analysis. A notable difficulty in applying machine learning to this domain is the lack of sufficiently sized experimental datasets, which has constrained researchers to train primarily on simulated data. However, models trained on simulated pXRD patterns showed limited generalization to experimental patterns, particularly for low-quality experimental patterns with high noise levels and elevated backgrounds. With the Open Experimental Powder X-Ray Diffraction Database (opXRD), we provide an openly available and easily accessible dataset of labeled and unlabeled experimental powder diffractograms. Labeled opXRD data can be used to evaluate the performance of models on experimental data and unlabeled opXRD data can help improve the performance of models on experimental data, e.g. through transfer learning methods. We collected umpatterns diffractograms, 2179 of them labeled, from a wide spectrum of materials classes. We hope this ongoing effort can guide machine learning research toward fully automated analysis of pXRD data and thus enable future self-driving materials labs.
Problem

Research questions and friction points this paper is trying to address.

Automating powder X-ray diffraction analysis for materials characterization
Overcoming limited generalization of models trained on simulated data
Providing an open experimental dataset to improve machine learning models
Innovation

Methods, ideas, or system contributions that make the work stand out.

Open database for experimental pXRD data
Machine learning for automated diffraction analysis
Transfer learning to improve model performance
🔎 Similar Papers
No similar papers found.
D
Daniel Hollarek
Institute of Theoretical Informatics, Karlsruhe Institute of Technology (KIT), 76131 Karlsruhe, Germany; Institute of Nanotechnology, Karlsruhe Institute of Technology (KIT), 76131 Karlsruhe, Germany
Henrik Schopmans
Henrik Schopmans
PhD candidate, Karlsruhe Institute of Technology
machine learninggraph neural networksmaterials sciencemolecular dynamicscoarse-graining
J
Jona Ostreicher
Institute of Theoretical Informatics, Karlsruhe Institute of Technology (KIT), 76131 Karlsruhe, Germany; Institute of Nanotechnology, Karlsruhe Institute of Technology (KIT), 76131 Karlsruhe, Germany
Jonas Teufel
Jonas Teufel
Karlsruher Institut of Technology
Machine Learning and Optimization
B
Bin Cao
Guangzhou Municipal Key Laboratory of Materials Informatics, Advanced Materials Thrust, Hong Kong University of Science and Technology (Guangzhou) (HKUST), Guangzhou 511400, China
A
A. Alwen
Department of Chemical Engineering and Materials Science, University of Southern California (USC), Los Angeles CA 90089, USA
S
Simon Schweidler
Institute of Nanotechnology, Karlsruhe Institute of Technology (KIT), 76131 Karlsruhe, Germany
M
Mriganka Singh
Molecular Foundry Division, Lawrence Berkeley National Laboratory (LBNL), Berkeley 94720 CA, USA
T
Tim Kodalle
Molecular Foundry Division, Lawrence Berkeley National Laboratory (LBNL), Berkeley 94720 CA, USA; Advanced Light Source, Lawrence Berkeley National Laboratory, Berkeley 94720 CA, USA
H
Hanlin Hu
Hoffmann Institute of Advanced Materials, Shenzhen Polytechnic, Shenzhen 518055, China
G
Grégoire Heymans
Lawrence Berkeley National Laboratory (LBNL), Chemical Sciences Division, Berkeley 94720 CA, USA
M
Maged Abdelsamie
Material Science and Engineering Department, King Fahd University of Petroleum and Minerals (KFUPM), Dhahran 31261, Saudi Arabia; Interdisciplinary Research Center for Intelligent Manufacturing and Robotics, King Fahd University of Petroleum and Minerals (KFUPM), Dhahran 31261, Saudi Arabia
A
Arthur Hardiagon
Chimie ParisTech, PSL University, CNRS, Institut de Recherche de Chimie Paris, 75005 Paris, France
A
Alexander Wieczorek
Empa–Swiss Federal Laboratories for Materials Science and Technology (EMPA), 8600 Dübendorf, Switzerland
S
S. Zhuk
Empa–Swiss Federal Laboratories for Materials Science and Technology (EMPA), 8600 Dübendorf, Switzerland
Ruth Schwaiger
Ruth Schwaiger
Institute of Energy Materials and Devices, Forschungszentrum Juelich GmbH, 52425 Juelich, Germany
S
S. Siol
Empa–Swiss Federal Laboratories for Materials Science and Technology (EMPA), 8600 Dübendorf, Switzerland
F
Franccois-Xavier Coudert
Chimie ParisTech, PSL University, CNRS, Institut de Recherche de Chimie Paris, 75005 Paris, France
Moritz Wolf
Moritz Wolf
Engler-Bunte-Institut & Institute of Catalysis Research and Technology, Karlsruhe Institute of Technology (KIT), Karlsruhe, Germany
C
Carolin M. Sutter‐Fella
Molecular Foundry Division, Lawrence Berkeley National Laboratory (LBNL), Berkeley 94720 CA, USA
B
B. Breitung
Institute of Nanotechnology, Karlsruhe Institute of Technology (KIT), 76131 Karlsruhe, Germany
A
Andrea M. Hodge
Department of Chemical Engineering and Materials Science, University of Southern California (USC), Los Angeles CA 90089, USA
T
Tong-yi Zhang
Guangzhou Municipal Key Laboratory of Materials Informatics, Advanced Materials Thrust, Hong Kong University of Science and Technology (Guangzhou) (HKUST), Guangzhou 511400, China
Pascal Friederich
Pascal Friederich
Karlsruhe Institute of Technology
Machine LearningMaterials designGraph Neural NetworksComputational chemistryMultiscale modeling