π€ AI Summary
To address low transparency, poor reproducibility, and weak discoverability in AI and data science research, this paper proposes and implements a semantic-driven research knowledge graph framework. Methodologically, we design and deploy the NFDI4DS ontology and standardized metadata schema, integrating community-shared vocabularies, automated information extraction techniques, and a modular knowledge graph construction pipeline to enable cross-modal semantic integration of datasets, models, software, and publications. Key contributions include: (1) the first community-developed, full-stack ontology for data science artifacts; (2) a scalable, FAIR-compliant architecture for knowledge interlinking; and (3) an open-source toolchain already adopted in multiple real-world research projects. Evaluation results demonstrate significant improvements in machine interpretability, cross-platform interoperability, and computational reproducibility of research assets.
π Abstract
As research in Artificial Intelligence and Data Science continues to grow in volume and complexity, it becomes increasingly difficult to ensure transparency, reproducibility, and discoverability. To address these challenges, as research artifacts should be understandable and usable by machines, the NFDI4DataScience consortium is developing and providing Research Knowledge Graphs (RKGs). Building upon earlier works, this paper presents recent progress in creating semantically rich RKGs using standardized ontologies, shared vocabularies, and automated Information Extraction techniques. Key achievements include the development of the NFDI4DS ontology, metadata standards, tools, and services designed to support the FAIR principles, as well as community-led projects and various implementations of RKGs. Together, these efforts aim to capture and connect the complex relationships between datasets, models, software, and scientific publications.