Building informative materials datasets beyond targeted objectives

📅 2026-05-06
📈 Citations: 0
Influential: 0
📄 PDF

career value

249K/year
📝 Abstract
Materials science data collection can be expensive, making the reuse and long-term utility of datasets critical important for future discovery campaigns. In practice, researchers prioritize a subset of properties due to research interests. However, ignoring a subset of outcomes in data collection campaigns potentially generate datasets poorly suited for future learning tasks. Here, we present a framework for dataset construction that maximizes informativeness for target properties of interest while preserving performance on untargeted ones. Our approach uses diversity-aware selection to ensure broad coverage of the materials space. In noisy experimental dataset construction, we find that without our diversity-aware framework, prediction performance on untargeted properties can degrade by up to 40% relative to random sampling, whereas applying our framework yields improvements of up to 10% . For targeted properties, performance can degrade with respect to random sampling by up to 12.5% without diversity, while our framework achieves gains of up to 25%. Incorporating diversity into dataset construction not only preserves informativeness for the targeted properties, but also improves materials coverage for potential future objectives. As a result, the constructed datasets remain broadly informative across considered and unconsidered outcomes, ensuring unbiased quality entries and mitigating cold-start limitations in subsequent modeling and discovery campaigns.
Problem

Research questions and friction points this paper is trying to address.

materials datasets
dataset construction
informativeness
targeted properties
diversity
Innovation

Methods, ideas, or system contributions that make the work stand out.

diversity-aware selection
materials informatics
dataset construction
multi-property prediction
cold-start mitigation
🔎 Similar Papers
2023-11-30Citations: 0
R
Rafael Espinosa Castañeda
Department of Materials Science and Engineering, University of Toronto, Canada.
A
Ashley Dale
Department of Materials Science and Engineering, University of Toronto, Canada.
H
Hongchen Wang
Department of Materials Science and Engineering, University of Toronto, Canada.
Y
Yonatan Kurniawan
Department of Materials Science and Engineering, University of Toronto, Canada.
H
Hao Wan
Department of Materials Science and Engineering, University of Toronto, Canada.
R
Runze Zhang
Department of Materials Science and Engineering, University of Toronto, Canada.
Adji Bousso Dieng
Adji Bousso Dieng
Assistant Professor of Computer Science, Princeton University
Machine LearningArtificial IntelligenceNatural Sciences
Kangming Li
Kangming Li
Assistant Professor at King Abdullah University of Science and Technology (KAUST)
Materials informaticsfirst principles calculationsmachine learning
Jason Hattrick-Simpers
Jason Hattrick-Simpers
Department of Materials Science and Engineering University of Toronto
artificial intelligenceautonomous sciencecombinatorial materials sciencecompositionally complex alloysmetallic glasses