Analysis of the Publication and Document Types in OpenAlex, Web of Science, Scopus, Pubmed and Semantic Scholar

📅 2024-06-21

🏛️ arXiv.org

📈 Citations: 9

✨ Influential: 3

career value

179K/year

🤖 AI Summary

Inconsistent document-type definitions across academic databases hinder cross-database comparability of bibliometric indicators, limiting their utility in research evaluation. This study presents the first systematic comparison of document-type classification schemes across five major platforms—OpenAlex, Web of Science, Scopus, PubMed, and Semantic Scholar—employing large-scale metadata extraction, cross-database label mapping, quantitative consistency assessment, and expert validation. Results reveal substantial structural disagreement regarding the classification of “research articles,” with OpenAlex exhibiting broad coverage but coarse-grained typology, necessitating rule-based calibration for alignment. The study delineates OpenAlex’s applicability boundaries and optimization pathways for bibliometric use, establishing a methodological foundation and empirical evidence for standardizing bibliometric practices in open science contexts.

Technology Category

Application Category

📝 Abstract

This study compares and analyses publication and document types in the following bibliographic databases: OpenAlex, Scopus, Web of Science, Semantic Scholar and PubMed. The results demonstrate that typologies can differ considerably between individual database providers. Moreover, the distinction between research and non-research texts, which is required to identify relevant documents for bibliometric analysis, can vary depending on the data source because publications are classified differently in the respective databases. The focus of this study, in addition to the cross-database comparison, is primarily on the coverage and analysis of the publication and document types contained in OpenAlex, as OpenAlex is becoming increasingly important as a free alternative to established proprietary providers for bibliometric analyses at libraries and universities.

Problem

Research questions and friction points this paper is trying to address.

Compare document type classification across five scholarly databases

Analyze data variation due to different curation strategies and taxonomies

Investigate discrepancies in publication type assignment (e.g., conference proceedings)

Innovation

Methods, ideas, or system contributions that make the work stand out.

Compare document types across multiple databases

Analyze taxonomy and data curation differences

Use shared corpus for classification comparison

🔎 Similar Papers

Coverage and metadata completeness and accuracy of African research publications in OpenAlex: A comparative analysis