BioMedJImpact: A Comprehensive Dataset and LLM Pipeline for AI Engagement and Scientific Impact Analysis of Biomedical Journals

πŸ“… 2025-11-16
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF

career value

202K/year
πŸ€– AI Summary
Prior research lacks a systematic characterization of the co-evolutionary dynamics between AI involvement and biomedical journal prestige. Method: We construct a large-scale dataset of 1.74 million papers, integrating bibliometric indicators, co-authorship network topology, and LLM-driven AI content identification via a three-stage semantic pipeline for content-aware quantification of AI involvement and joint modeling with collaboration diversity. Results: High collaboration intensity and broad AI participation significantly enhance journal citation impact; AI features robustly predict journal quartiles, with consistent effects pre- and post-pandemic; human evaluation confirms >92% accuracy of the LLM pipeline. The study delivers a reproducible scientometric framework, establishing a novel paradigm for evaluating scholarly impact in the AI era.

Technology Category

Application Category

πŸ“ Abstract
Assessing journal impact is central to scholarly communication, yet existing open resources rarely capture how collaboration structures and artificial intelligence (AI) research jointly shape venue prestige in biomedicine. We present BioMedJImpact, a large-scale, biomedical-oriented dataset designed to advance journal-level analysis of scientific impact and AI engagement. Built from 1.74 million PubMed Central articles across 2,744 journals, BioMedJImpact integrates bibliometric indicators, collaboration features, and LLM-derived semantic indicators for AI engagement. Specifically, the AI engagement feature is extracted through a reproducible three-stage LLM pipeline that we propose. Using this dataset, we analyze how collaboration intensity and AI engagement jointly influence scientific impact across pre- and post-pandemic periods (2016-2019, 2020-2023). Two consistent trends emerge: journals with higher collaboration intensity, particularly those with larger and more diverse author teams, tend to achieve greater citation impact, and AI engagement has become an increasingly strong correlate of journal prestige, especially in quartile rankings. To further validate the three-stage LLM pipeline we proposed for deriving the AI engagement feature, we conduct human evaluation, confirming substantial agreement in AI relevance detection and consistent subfield classification. Together, these contributions demonstrate that BioMedJImpact serves as both a comprehensive dataset capturing the intersection of biomedicine and AI, and a validated methodological framework enabling scalable, content-aware scientometric analysis of scientific impact and innovation dynamics. Code is available at https://github.com/JonathanWry/BioMedJImpact.
Problem

Research questions and friction points this paper is trying to address.

Analyzing how collaboration structures and AI research jointly influence biomedical journal prestige
Developing a reproducible LLM pipeline for extracting AI engagement features from scientific literature
Investigating the relationship between collaboration intensity, AI engagement, and scientific impact metrics
Innovation

Methods, ideas, or system contributions that make the work stand out.

LLM pipeline extracts AI engagement features
Dataset integrates bibliometric and semantic indicators
Methodology enables scalable content-aware scientometric analysis
πŸ”Ž Similar Papers
No similar papers found.
πŸ’Ό Related Jobs
AI Data Engineer--LLMs / Agentic Systems
Pfizer
The annual base salary for this position ranges from $106,000.00 to $176,600.00. In addition, this position is eligible for participation in Pfizer’s Global Performance Plan with a bonus target of 15.0% of the base salary and eligibility to participate in our share based long term incentive program. We offer comprehensive and generous benefits and programs to help our colleagues lead healthy lives and to support each of life’s moments. Benefits offered include a 401(k) plan with Pfizer Matching Contributions and an additional Pfizer Retirement Savings Contribution, paid vacation, holiday and personal days, paid caregiver/parental and medical leave, and health benefits to include medical, prescription drug, dental and vision coverage. Learn more at Pfizer Candidate Site – U.S. Benefits | (uscandidates.mypfizerbenefits.com). Pfizer compensation structures and benefit packages are aligned based on the location of hire. The United States salary range provided does not apply to Tampa, FL or any location outside of the United States. Relocation assistance may be available based on business needs and/or eligibility.
United States - Massachusetts - Cambridge