One in Eight OpenAlex Abstracts Has Integrity Issues

📅 2026-05-19

📈 Citations: 0

✨ Influential: 0

career value

169K/year

🤖 AI Summary

This study addresses the unexamined issue of abstract completeness in OpenAlex, a widely used scholarly metadata database, which poses potential threats to the reliability of computational meta-research. By employing a two-stage hybrid annotation approach that combines expert manual review with large language model classification, the authors evaluate 10,000 randomly sampled English journal abstracts and reveal, for the first time, that 12% exhibit completeness deficiencies. These defects primarily manifest as insufficient content or misaligned metadata. Beyond quantifying this data quality issue, the work introduces and implements a community-driven collaborative annotation portal to enable ongoing improvement of abstract data quality, thereby offering both a critical warning and essential infrastructure support for downstream research endeavors.

📝 Abstract

Scientific abstracts are increasingly used as primary data in computational metascience research, yet the quality of these abstracts in widely used bibliographic databases has not been systematically examined. We assess the integrity of 10,000 randomly sampled English-language journal abstracts from OpenAlex using a two-stage annotation protocol combining human expert review and large language model classification. We identify seven distinct failure modes and find that 12\% of abstracts have integrity issues, with insufficient content and misplaced metadata being the most prevalent. We discuss implications for downstream research and describe a forthcoming community portal to support collective annotation efforts.

Problem

Research questions and friction points this paper is trying to address.

scientific abstracts

data integrity

bibliographic databases

metascience

OpenAlex

Innovation

Methods, ideas, or system contributions that make the work stand out.

abstract integrity

two-stage annotation

large language model classification

metascience

data quality

🔎 Similar Papers

No similar papers found.

💼 Related Jobs

Research Scientist, AI Language