Human-in-the-Loop and AI: Crowdsourcing Metadata Vocabulary for Materials Science

📅 2025-12-10
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Material science metadata vocabulary development faces dual challenges of insufficient human expertise and inadequate standardization, hindering FAIR/FARR data practices. To address this, we propose MatSci-YAMZ, a novel platform introducing AI-HILT (AI-generated + Human-in-the-Loop crowdsourcing), a closed-loop iterative framework integrating large language models for automated definition generation, structured crowdsourcing interfaces, multi-round expert feedback, and consistency validation. Concurrently, the platform establishes a reusable, cross-disciplinary metadata co-creation protocol. Evaluated by six domain experts, the approach yielded 19 high-quality, semantically precise term definitions with significantly reduced consensus-building time. Results demonstrate the method’s effectiveness and feasibility in enhancing semantic transparency, model interpretability, openness, and cross-domain scalability—thereby advancing reproducible, interoperable materials data infrastructure.

Technology Category

Application Category

📝 Abstract
Metadata vocabularies are essential for advancing FAIR and FARR data principles, but their development constrained by limited human resources and inconsistent standardization practices. This paper introduces MatSci-YAMZ, a platform that integrates artificial intelligence (AI) and human-in-the-loop (HILT), including crowdsourcing, to support metadata vocabulary development. The paper reports on a proof-of-concept use case evaluating the AI-HILT model in materials science, a highly interdisciplinary domain Six (6) participants affiliated with the NSF Institute for Data-Driven Dynamical Design (ID4) engaged with the MatSci-YAMZ plaform over several weeks, contributing term definitions and providing examples to prompt the AI-definitions refinement. Nineteen (19) AI-generated definitions were successfully created, with iterative feedback loops demonstrating the feasibility of AI-HILT refinement. Findings confirm the feasibility AI-HILT model highlighting 1) a successful proof of concept, 2) alignment with FAIR and open-science principles, 3) a research protocol to guide future studies, and 4) the potential for scalability across domains. Overall, MatSci-YAMZ's underlying model has the capacity to enhance semantic transparency and reduce time required for consensus building and metadata vocabulary development.
Problem

Research questions and friction points this paper is trying to address.

Develops AI-human platform for metadata vocabulary creation
Addresses limited resources in materials science standardization
Enhances semantic transparency and reduces consensus-building time
Innovation

Methods, ideas, or system contributions that make the work stand out.

Integrates AI with human-in-the-loop crowdsourcing for metadata development
Uses iterative feedback loops to refine AI-generated term definitions
Provides a scalable model aligning with FAIR and open-science principles
🔎 Similar Papers
No similar papers found.
J
Jane Greenberg
Metadata Research Center, College of Computing and Informatics, Drexel University
S
Scott McClellan
Metadata Research Center, College of Computing and Informatics, Drexel University
A
Addy Ireland
Penn State University
R
Robert Sammarco
Metadata Research Center, College of Computing and Informatics, Drexel University
C
Colton Gerber
Colorado School of Mines
C
Christopher B. Rauch
Metadata Research Center, College of Computing and Informatics, Drexel University
Mat Kelly
Mat Kelly
Assistant Professor, College of Computing and Informatics, Drexel University
Web ArchivingInformation VisualizationWeb ScienceInformation Retrieval@WebSciDL
J
John Kunze
Metadata Research Center, College of Computing and Informatics, Drexel University
Yuan An
Yuan An
College of Computing and Informatics, Drexel University
Data IntegrationKnowledge GraphOntologyData MiningMachine Learning
Eric Toberer
Eric Toberer
Colorado School of Mines