Investigating Document Type, Language, Publication Year, and Author Count Discrepancies Between OpenAlex and Web of Science

📅 2025-08-25
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the problem of systematic metadata discrepancies between OpenAlex and Web of Science (WoS) and their potential impact on bibliometric analyses. We systematically compare consistency across four critical metadata dimensions—document type, publication year, language, and author count—via cross-database citation matching, rigorous data cleaning, and multidimensional quantitative consistency assessment. Our method enables the first comprehensive, empirical evaluation of metadata quality differences between these two major scholarly databases. Key findings reveal distinct error patterns: OpenAlex exhibits significant overestimation of author counts and misclassification of document types, whereas WoS underrepresents non-English publications. Year misalignment and language mislabeling further compound inter-database inconsistencies. These results provide empirical evidence and methodological guidance for database selection, interpretation of bibliometric indicators, and metadata curation in research evaluation and science policy.

Technology Category

Application Category

📝 Abstract
Bibliometrics, whether used for research or research evaluation, relies on large multidisciplinary databases of research outputs and citation indices. The Web of Science (WoS) was the main supporting infrastructure of the field for more than 30 years until several new competitors emerged. OpenAlex, a bibliographic database launched in 2022, has distinguished itself for its openness and extensive coverage. While OpenAlex may reduce or eliminate barriers to accessing bibliometric data, one of the concerns that hinders its broader adoption for research and research evaluation is the quality of its metadata. This study aims to assess metadata quality in OpenAlex and WoS, focusing on document type, publication year, language, and number of authors. By addressing discrepancies and misattributions in metadata, this research seeks to enhance awareness of data quality issues that could impact bibliometric research and evaluation outcomes.
Problem

Research questions and friction points this paper is trying to address.

Assessing metadata quality discrepancies between OpenAlex and Web of Science
Investigating document type, publication year, and author count differences
Evaluating how metadata errors impact bibliometric research outcomes
Innovation

Methods, ideas, or system contributions that make the work stand out.

Comparative metadata quality assessment
Analyzing document type and author discrepancies
Evaluating OpenAlex versus Web of Science coverage
Philippe Mongeon
Philippe Mongeon
Department of Information Science, Dalhousie University
Quantitative Science StudiesScholarly CommunicationBibliometrics
Madelaine Hare
Madelaine Hare
Dalhousie University
Quantitative Science StudiesScholarly CommunicationBibliometrics
P
Poppy Riddle
Department of Information Science, Dalhousie University
S
Summer Wilson
Department of Information Science, Dalhousie University
G
Geoff Krause
Department of Information Science, Dalhousie University
R
Rebecca Marjoram
Department of Information Science, Dalhousie University
R
Rémi Toupin
Department of Information Science, Dalhousie University