Analysing the coverage of the University of Bologna's publication metadata in an existing source of open research information

📅 2025-01-10
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study systematically evaluates the coverage quality and citation performance of publications from the University of Bologna’s CRIS (IRIS) in OpenCitations, addressing a critical gap in assessing interoperability between institutional research information systems and open citation infrastructures. Method: Leveraging a data-dump-based ETL pipeline, the study integrates entity resolution with SQL/Python-driven metadata reconciliation and citation network analysis to quantify metadata discrepancies across sources. Contribution/Results: It reveals that only 37.7% of IRIS publications are indexed in OpenCitations—highest for journal articles—and identifies 4.29 million external citations to University of Bologna outputs; their mean citation counts slightly lag behind those in Scopus and Web of Science. The work delivers a reusable data cleaning and cross-source alignment framework, and releases the first standardized benchmark dataset aligning an Italian university CRIS with OpenCitations—establishing a methodological paradigm for evaluating interoperability in open science infrastructure.

Technology Category

Application Category

📝 Abstract
This study focuses on analysing the coverage of publications' metadata available in the Current Research Information System (CRIS) infrastructure of the University of Bologna (UNIBO), implemented by the IRIS platform, within an authoritative source of open research information, i.e. OpenCitations. The analysis considers data regarding the publication entities alongside the citation links. We precisely quantify the proportion of UNIBO IRIS publications included in OpenCitations, examine their types, and evaluate the number of citations in OpenCitations that involve IRIS publications. Our methodology filters and transforms data dumps of IRIS and OpenCitations, creating novel datasets used for the analysis. Our findings reveal that only 37.7% of IRIS is covered in OpenCitations, with journal articles exhibiting the highest coverage. We identified 4,290,096 citation links pointing to UNIBO IRIS publications. From a purely quantitative perspective, comparing our results with broader proprietary services like Scopus and Web of Science reveals a small gap in the average number of citations per bibliographic resource. However, further analysis with updated data is required to support this speculation.
Problem

Research questions and friction points this paper is trying to address.

OpenCitations Corpus
Bibliometric Analysis
Comparison with Scopus and Web of Science
Innovation

Methods, ideas, or system contributions that make the work stand out.

OpenCitations Corpus
Bologna University Publications
Citation Analysis
🔎 Similar Papers
No similar papers found.
E
Erica Andreose
Digital Humanities and Digital Knowledge, Department of Classical Philology and Italian Studies, University of Bologna, Bologna, Italy
S
Salvatore Di Marzo
Digital Humanities and Digital Knowledge, Department of Classical Philology and Italian Studies, University of Bologna, Bologna, Italy
Ivan Heibi
Ivan Heibi
University of Bologna
Semantic PublishingSemantic WebData VisualisationWeb technologies
Silvio Peroni
Silvio Peroni
University of Bologna
Semantic PublishingSemantic WebOpen ScienceScience of ScienceScholarly Communication
Leonardo Zilli
Leonardo Zilli
Unknown affiliation