Opioid Named Entity Recognition (ONER-2025) from Reddit

📅 2025-03-28

📈 Citations: 0

✨ Influential: 0

career value

148K/year

🤖 AI Summary

To address the public health crisis of opioid overdose, this study proposes a real-time risk monitoring framework leveraging Reddit data. Confronting challenges such as lexical ambiguity and implicit entity expression in social media discourse on opioids, we construct the first large-scale, manually annotated opioid-related entity dataset derived from Reddit (331K tokens), systematically characterizing domain-specific linguistic patterns. We design a real-time event detection framework integrating heterogeneous data streams—including social media posts, electronic health records, and emergency response reports—employing context-enhanced BERT/RoBERTa models within a hybrid machine learning–deep learning architecture. Evaluated via five-fold cross-validation, our approach achieves 97% entity recognition accuracy and F1-score, outperforming a random forest baseline by 10.23%. This work establishes a scalable, high-precision, cross-source monitoring paradigm for early warning of opioid-related public health threats.

Technology Category

Application Category

📝 Abstract

The opioid overdose epidemic remains a critical public health crisis, particularly in the United States, leading to significant mortality and societal costs. Social media platforms like Reddit provide vast amounts of unstructured data that offer insights into public perceptions, discussions, and experiences related to opioid use. This study leverages Natural Language Processing (NLP), specifically Opioid Named Entity Recognition (ONER-2025), to extract actionable information from these platforms. Our research makes four key contributions. First, we created a unique, manually annotated dataset sourced from Reddit, where users share self-reported experiences of opioid use via different administration routes. This dataset contains 331,285 tokens and includes eight major opioid entity categories. Second, we detail our annotation process and guidelines while discussing the challenges of labeling the ONER-2025 dataset. Third, we analyze key linguistic challenges, including slang, ambiguity, fragmented sentences, and emotionally charged language, in opioid discussions. Fourth, we propose a real-time monitoring system to process streaming data from social media, healthcare records, and emergency services to identify overdose events. Using 5-fold cross-validation in 11 experiments, our system integrates machine learning, deep learning, and transformer-based language models with advanced contextual embeddings to enhance understanding. Our transformer-based models (bert-base-NER and roberta-base) achieved 97% accuracy and F1-score, outperforming baselines by 10.23% (RF=0.88).

Problem

Research questions and friction points this paper is trying to address.

Extract opioid-related information from Reddit using NLP

Analyze linguistic challenges in opioid discussions on social media

Develop real-time monitoring system for overdose events

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses NLP for opioid entity recognition

Creates annotated dataset from Reddit posts

Integrates transformer models for high accuracy

🔎 Similar Papers

EasyNER: A Customizable Easy-to-Use Pipeline for Deep Learning- and Dictionary-based Named Entity Recognition from Medical Text