Crowdsourcing-Based Knowledge Graph Construction for Drug Side Effects Using Large Language Models with an Application on Semaglutide

πŸ“… 2025-04-06
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This study addresses the insufficient detection of real-world adverse drug reaction (ADR) signals in pharmacovigilance. We propose a novel knowledge graph construction paradigm integrating crowdsourced data with large language models (LLMs). Methodologically, we extract patient-reported narratives on semaglutide (for weight management) from social media platforms (e.g., Reddit), employ LLMs for entity recognition, relation extraction, and temporal modeling of unstructured text, and perform cross-source alignment and validation against the FDA Adverse Event Reporting System (FAERS). Key contributions include: (1) the first comprehensive ADR knowledge graph dedicated to multiple branded formulations of semaglutide; (2) identification of 12 temporally distinct safety signalsβ€”78% of which are corroborated by FAERS; and (3) substantial enhancement of dynamic pharmacoepidemiological insights and knowledge base completion derived from patient-generated narratives.

Technology Category

Application Category

πŸ“ Abstract
Social media is a rich source of real-world data that captures valuable patient experience information for pharmacovigilance. However, mining data from unstructured and noisy social media content remains a challenging task. We present a systematic framework that leverages large language models (LLMs) to extract medication side effects from social media and organize them into a knowledge graph (KG). We apply this framework to semaglutide for weight loss using data from Reddit. Using the constructed knowledge graph, we perform comprehensive analyses to investigate reported side effects across different semaglutide brands over time. These findings are further validated through comparison with adverse events reported in the FAERS database, providing important patient-centered insights into semaglutide's side effects that complement its safety profile and current knowledge base of semaglutide for both healthcare professionals and patients. Our work demonstrates the feasibility of using LLMs to transform social media data into structured KGs for pharmacovigilance.
Problem

Research questions and friction points this paper is trying to address.

Extracting drug side effects from noisy social media data
Constructing knowledge graphs using large language models
Validating findings against pharmacovigilance databases
Innovation

Methods, ideas, or system contributions that make the work stand out.

Leveraging LLMs to extract drug side effects
Organizing social media data into knowledge graphs
Validating findings with FAERS database comparisons
Z
Zhijie Duan
University of Pennsylvania, Philadelphia, PA, USA
Kai Wei
Kai Wei
Amazon
Computational social scienceNLPSLU
Zhaoqian Xue
Zhaoqian Xue
University of Pennsylvania
L
Lingyao li
University of South Florida, Tampa, FL, USA
Jin Jin
Jin Jin
University of Pennsylvania
Biostatistics
S
Shu Yang
University of Pennsylvania, Philadelphia, PA, USA
Jiayan Zhou
Jiayan Zhou
Stanford University
GenomicsEnvironmental Health ScienceKnowledge-based ModelingStatistical Modeling
S
Siyuan Ma
Vanderbilt University, Nashville, TN, USA