🤖 AI Summary
Scientists frequently record experimental metadata in spreadsheets, yet ensuring consistency and standards compliance remains challenging. This paper introduces a spreadsheet-native metadata governance paradigm: customized Excel/CSV templates embed HuBMAP standards; OWL/SKOS ontology-driven controlled vocabularies are integrated; and a web-based real-time semantic validation tool enables immediate, on-entry verification. The approach seamlessly incorporates semantic constraints into familiar spreadsheet workflows—requiring no platform switching or new system adoption. Deployed across the HuBMAP Consortium, it significantly improved multi-omics metadata compliance rates, increased data entry efficiency, and reduced error identification and correction time by over 70%. To our knowledge, this is the first work to deeply embed ontology-based constraints and real-time semantic validation directly within spreadsheet environments, establishing a scalable, practical paradigm for biomedical metadata standardization.
📝 Abstract
Scientists increasingly recognize the importance of providing rich, standards-adherent metadata to describe their experimental results. Despite the availability of sophisticated tools to assist in the process of data annotation, investigators generally seem to prefer to use spreadsheets when supplying metadata, despite the limitations of spreadsheets in ensuring metadata consistency and compliance with formal specifications. In this paper, we describe an end-to-end approach that supports spreadsheet-based entry of metadata, while ensuring rigorous adherence to community-based metadata standards and providing quality control. Our methods employ several key components, including customizable templates that represent metadata standards and that can inform the spreadsheets that investigators use to author metadata, controlled terminologies and ontologies for defining metadata values that can be accessed directly from a spreadsheet, and an interactive Web-based tool that allows users to rapidly identify and fix errors in their spreadsheet-based metadata. We demonstrate how this approach is being deployed in a biomedical consortium known as HuBMAP to define and collect metadata about a wide range of biological assays.