Works-magnet: Accelerating Metadata Curation for Open Science

📅 2025-06-17
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
In open science, bibliographic and research data metadata exhibit high heterogeneity, require complex processing pipelines, and heavily depend on manual curation and validation. Method: This paper proposes an interpretable and editable AI-driven metadata enhancement system. It integrates rule-augmented NLP parsing with institutional affiliation disambiguation algorithms to construct an open-source, auditable processing pipeline. Crucially, it introduces the first visualization of the AI-based metadata generation process, enabling real-time, expert-guided interactive correction. Contribution/Results: The system significantly reduces manual verification effort while improving both accuracy and timeliness of institutional attribution metadata—demonstrated empirically on OpenAlex. Its implementation is fully open-sourced and has been deployed in France’s national open science monitoring infrastructure.

Technology Category

Application Category

📝 Abstract
The transition to Open Science necessitates robust and reliable metadata. While national initiatives, such as the French Open Science Monitor, aim to track this evolution using open data, reliance on proprietary databases persists in many places. Open platforms like OpenAlex still require significant human intervention for data accuracy. This paper introduces Works-magnet, a project by the French Ministry of Higher Education and Research (MESR) Data Science&Engineering Team. Works-magnet is designed to accelerate the curation of bibliographic and research data metadata, particularly affiliations, by making automated AI calculations visible and correctable. It addresses challenges related to metadata heterogeneity, complex processing chains, and the need for human curation in a diverse research landscape. The paper details Works-magnet's concepts, and the observed limitations, while outlining future directions for enhancing open metadata quality and reusability. The works-magnet app is open source on github https://github.com/dataesr/works-magnet
Problem

Research questions and friction points this paper is trying to address.

Accelerating metadata curation for Open Science
Addressing metadata heterogeneity and complexity
Reducing human intervention in data accuracy
Innovation

Methods, ideas, or system contributions that make the work stand out.

Automated AI for metadata curation
Visible and correctable AI calculations
Open source platform for affiliations
🔎 Similar Papers
No similar papers found.