🤖 AI Summary
Material science metadata vocabulary development faces dual challenges of insufficient human expertise and inadequate standardization, hindering FAIR/FARR data practices. To address this, we propose MatSci-YAMZ, a novel platform introducing AI-HILT (AI-generated + Human-in-the-Loop crowdsourcing), a closed-loop iterative framework integrating large language models for automated definition generation, structured crowdsourcing interfaces, multi-round expert feedback, and consistency validation. Concurrently, the platform establishes a reusable, cross-disciplinary metadata co-creation protocol. Evaluated by six domain experts, the approach yielded 19 high-quality, semantically precise term definitions with significantly reduced consensus-building time. Results demonstrate the method’s effectiveness and feasibility in enhancing semantic transparency, model interpretability, openness, and cross-domain scalability—thereby advancing reproducible, interoperable materials data infrastructure.
📝 Abstract
Metadata vocabularies are essential for advancing FAIR and FARR data principles, but their development constrained by limited human resources and inconsistent standardization practices. This paper introduces MatSci-YAMZ, a platform that integrates artificial intelligence (AI) and human-in-the-loop (HILT), including crowdsourcing, to support metadata vocabulary development. The paper reports on a proof-of-concept use case evaluating the AI-HILT model in materials science, a highly interdisciplinary domain Six (6) participants affiliated with the NSF Institute for Data-Driven Dynamical Design (ID4) engaged with the MatSci-YAMZ plaform over several weeks, contributing term definitions and providing examples to prompt the AI-definitions refinement. Nineteen (19) AI-generated definitions were successfully created, with iterative feedback loops demonstrating the feasibility of AI-HILT refinement. Findings confirm the feasibility AI-HILT model highlighting 1) a successful proof of concept, 2) alignment with FAIR and open-science principles, 3) a research protocol to guide future studies, and 4) the potential for scalability across domains. Overall, MatSci-YAMZ's underlying model has the capacity to enhance semantic transparency and reduce time required for consensus building and metadata vocabulary development.