Analysis of the Publication and Document Types in OpenAlex, Web of Science, Scopus, Pubmed and Semantic Scholar

📅 2024-06-21
🏛️ arXiv.org
📈 Citations: 9
Influential: 3
📄 PDF
🤖 AI Summary
Inconsistent document-type definitions across academic databases hinder cross-database comparability of bibliometric indicators, limiting their utility in research evaluation. This study presents the first systematic comparison of document-type classification schemes across five major platforms—OpenAlex, Web of Science, Scopus, PubMed, and Semantic Scholar—employing large-scale metadata extraction, cross-database label mapping, quantitative consistency assessment, and expert validation. Results reveal substantial structural disagreement regarding the classification of “research articles,” with OpenAlex exhibiting broad coverage but coarse-grained typology, necessitating rule-based calibration for alignment. The study delineates OpenAlex’s applicability boundaries and optimization pathways for bibliometric use, establishing a methodological foundation and empirical evidence for standardizing bibliometric practices in open science contexts.

Technology Category

Application Category

📝 Abstract
This study compares and analyses publication and document types in the following bibliographic databases: OpenAlex, Scopus, Web of Science, Semantic Scholar and PubMed. The results demonstrate that typologies can differ considerably between individual database providers. Moreover, the distinction between research and non-research texts, which is required to identify relevant documents for bibliometric analysis, can vary depending on the data source because publications are classified differently in the respective databases. The focus of this study, in addition to the cross-database comparison, is primarily on the coverage and analysis of the publication and document types contained in OpenAlex, as OpenAlex is becoming increasingly important as a free alternative to established proprietary providers for bibliometric analyses at libraries and universities.
Problem

Research questions and friction points this paper is trying to address.

Compare document type classification across five scholarly databases
Analyze data variation due to different curation strategies and taxonomies
Investigate discrepancies in publication type assignment (e.g., conference proceedings)
Innovation

Methods, ideas, or system contributions that make the work stand out.

Compare document types across multiple databases
Analyze taxonomy and data curation differences
Use shared corpus for classification comparison
🔎 Similar Papers
No similar papers found.
N
Nick Haupka
Göttingen State and University Library, University of Göttingen
J
Jack H. Culbert
GESIS – Leibniz Institute for the Social Sciences
Alexander Schniedermann
Alexander Schniedermann
German Centre for Higher Education Research and Science Studies
Sociology of ScienceBibliometricsScientometricsSystematic ReviewsReporting Guidelines
N
N. Jahn
Göttingen State and University Library, University of Göttingen
Philipp Mayr
Philipp Mayr
GESIS - Leibniz Institute for the Social Sciences
Interactive Information RetrievalInformetricsDigital librariesInformation SeekingDataset Search