Human-in-the-Loop and AI: Crowdsourcing Metadata Vocabulary for Materials Science

📅 2025-12-10

📈 Citations: 0

✨ Influential: 0

career value

217K/year

🤖 AI Summary

Material science metadata vocabulary development faces dual challenges of insufficient human expertise and inadequate standardization, hindering FAIR/FARR data practices. To address this, we propose MatSci-YAMZ, a novel platform introducing AI-HILT (AI-generated + Human-in-the-Loop crowdsourcing), a closed-loop iterative framework integrating large language models for automated definition generation, structured crowdsourcing interfaces, multi-round expert feedback, and consistency validation. Concurrently, the platform establishes a reusable, cross-disciplinary metadata co-creation protocol. Evaluated by six domain experts, the approach yielded 19 high-quality, semantically precise term definitions with significantly reduced consensus-building time. Results demonstrate the method’s effectiveness and feasibility in enhancing semantic transparency, model interpretability, openness, and cross-domain scalability—thereby advancing reproducible, interoperable materials data infrastructure.

Technology Category

Application Category

📝 Abstract

Metadata vocabularies are essential for advancing FAIR and FARR data principles, but their development constrained by limited human resources and inconsistent standardization practices. This paper introduces MatSci-YAMZ, a platform that integrates artificial intelligence (AI) and human-in-the-loop (HILT), including crowdsourcing, to support metadata vocabulary development. The paper reports on a proof-of-concept use case evaluating the AI-HILT model in materials science, a highly interdisciplinary domain Six (6) participants affiliated with the NSF Institute for Data-Driven Dynamical Design (ID4) engaged with the MatSci-YAMZ plaform over several weeks, contributing term definitions and providing examples to prompt the AI-definitions refinement. Nineteen (19) AI-generated definitions were successfully created, with iterative feedback loops demonstrating the feasibility of AI-HILT refinement. Findings confirm the feasibility AI-HILT model highlighting 1) a successful proof of concept, 2) alignment with FAIR and open-science principles, 3) a research protocol to guide future studies, and 4) the potential for scalability across domains. Overall, MatSci-YAMZ's underlying model has the capacity to enhance semantic transparency and reduce time required for consensus building and metadata vocabulary development.

Problem

Research questions and friction points this paper is trying to address.

Develops AI-human platform for metadata vocabulary creation

Addresses limited resources in materials science standardization

Enhances semantic transparency and reduces consensus-building time

Innovation

Methods, ideas, or system contributions that make the work stand out.

Integrates AI with human-in-the-loop crowdsourcing for metadata development

Uses iterative feedback loops to refine AI-generated term definitions

Provides a scalable model aligning with FAIR and open-science principles

🔎 Similar Papers

Toward Reliable Ad-hoc Scientific Information Extraction: A Case Study on Two Materials Datasets

2024-06-08Annual Meeting of the Association for Computational LinguisticsCitations: 2

Construction and Application of Materials Knowledge Graph in Multidisciplinary Materials Science via Large Language Model

2024-04-03Neural Information Processing SystemsCitations: 1