Opioid Named Entity Recognition (ONER-2025) from Reddit

📅 2025-03-28
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the public health crisis of opioid overdose, this study proposes a real-time risk monitoring framework leveraging Reddit data. Confronting challenges such as lexical ambiguity and implicit entity expression in social media discourse on opioids, we construct the first large-scale, manually annotated opioid-related entity dataset derived from Reddit (331K tokens), systematically characterizing domain-specific linguistic patterns. We design a real-time event detection framework integrating heterogeneous data streams—including social media posts, electronic health records, and emergency response reports—employing context-enhanced BERT/RoBERTa models within a hybrid machine learning–deep learning architecture. Evaluated via five-fold cross-validation, our approach achieves 97% entity recognition accuracy and F1-score, outperforming a random forest baseline by 10.23%. This work establishes a scalable, high-precision, cross-source monitoring paradigm for early warning of opioid-related public health threats.

Technology Category

Application Category

📝 Abstract
The opioid overdose epidemic remains a critical public health crisis, particularly in the United States, leading to significant mortality and societal costs. Social media platforms like Reddit provide vast amounts of unstructured data that offer insights into public perceptions, discussions, and experiences related to opioid use. This study leverages Natural Language Processing (NLP), specifically Opioid Named Entity Recognition (ONER-2025), to extract actionable information from these platforms. Our research makes four key contributions. First, we created a unique, manually annotated dataset sourced from Reddit, where users share self-reported experiences of opioid use via different administration routes. This dataset contains 331,285 tokens and includes eight major opioid entity categories. Second, we detail our annotation process and guidelines while discussing the challenges of labeling the ONER-2025 dataset. Third, we analyze key linguistic challenges, including slang, ambiguity, fragmented sentences, and emotionally charged language, in opioid discussions. Fourth, we propose a real-time monitoring system to process streaming data from social media, healthcare records, and emergency services to identify overdose events. Using 5-fold cross-validation in 11 experiments, our system integrates machine learning, deep learning, and transformer-based language models with advanced contextual embeddings to enhance understanding. Our transformer-based models (bert-base-NER and roberta-base) achieved 97% accuracy and F1-score, outperforming baselines by 10.23% (RF=0.88).
Problem

Research questions and friction points this paper is trying to address.

Extract opioid-related information from Reddit using NLP
Analyze linguistic challenges in opioid discussions on social media
Develop real-time monitoring system for overdose events
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses NLP for opioid entity recognition
Creates annotated dataset from Reddit posts
Integrates transformer models for high accuracy
🔎 Similar Papers
No similar papers found.
Muhammad Ahmad
Muhammad Ahmad
King Fahd University of Petroleum and Minerals
Machine LearningComputer VisionHyperspectral imaging
Humaira Farid
Humaira Farid
Concordia University, Montreal
I
Iqra Ameer
Department of Computer Science, Division of Engineering and Science at Abington, The Pennsylvania State University, University Park, PA, 19001, USA
M
Muhammad Muzamil
Department of Computer Science, the Islamia University of Bahawalpur, 63100, Pakistan
A
Ameer Hamza Muhammad Jalal
Department of Computer Science, the Islamia University of Bahawalpur, 63100, Pakistan
Ildar Batyrshin
Ildar Batyrshin
Instituto Politecnico Nacional
Grigori Sidorov
Grigori Sidorov
Professor of Computational Linguistics, Instituto Politécnico Nacional (IPN), Mexico
Computational LinguisticsNatural Language ProcessingArtificial IntelligenceMachine Learning