Can tweets predict article retractions? A comparison between human and LLM labelling

📅 2024-03-25

📈 Citations: 0

✨ Influential: 0

career value

179K/year

🤖 AI Summary

This study investigates whether social media commentary (specifically Twitter posts) can serve as an early warning signal for scientific paper retractions. Method: Leveraging 4,354 tweets associated with 504 retracted papers, we systematically compare the efficacy of human annotation, large language models (GPT-4o-mini, Gemini 1.5 Flash, Claude-3.5-Haiku), and lexicon-based sentiment analysis (TextBlob) in detecting critical signals preceding retraction. Contribution/Results: 25.7% of critical tweets predated official retractions; LLMs significantly outperformed TextBlob in detection accuracy—providing the first empirical evidence of generative AI’s superiority in early research integrity monitoring. However, only 11.1% of retracted papers received prior critical attention on Twitter, revealing substantial coverage limitations. The study establishes a novel “social data + generative AI” synergistic surveillance paradigm, offering a methodological foundation for proactive identification of academic misconduct risks.

Technology Category

Application Category

📝 Abstract

Quickly detecting problematic research articles is crucial to safeguarding the integrity of scientific research. This study explores whether Twitter mentions of retracted articles can signal potential problems with the articles prior to their retraction, potentially serving as an early warning system for scholars. To investigate this, we analysed a dataset of 4,354 Twitter mentions associated with 504 retracted articles. The effectiveness of Twitter mentions in predicting article retractions was evaluated by both manual and Large Language Model (LLM) labelling. Manual labelling results indicated that 25.7% of tweets signalled problems before retraction. Using the manual labelling results as the baseline, we found that LLMs (GPT-4o-mini, Gemini 1.5 Flash, and Claude-3.5-Haiku) outperformed lexicon-based sentiment analysis tools (e.g., TextBlob) in detecting potential problems, suggesting that automatic detection of problematic articles from social media using LLMs is technically feasible. Nevertheless, since only a small proportion of retracted articles (11.1%) were criticised on Twitter prior to retraction, such automatic systems would detect only a minority of problematic articles. Overall, this study offers insights into how social media data, coupled with emerging generative AI techniques, can support research integrity.

Problem

Research questions and friction points this paper is trying to address.

Investigating if social media tweets can predict article retraction early

Comparing human annotation with LLMs for identifying critical research tweets

Evaluating human-AI collaboration for reliable research integrity monitoring

Innovation

Methods, ideas, or system contributions that make the work stand out.

Used human annotation to identify critical tweets

Applied large language models for automated tweet analysis

Proposed human-AI collaborative approach for monitoring

🔎 Similar Papers

No similar papers found.