🤖 AI Summary
To address spurious adverse drug reaction (ADR) signal detection caused by reporting bias in the FDA Adverse Event Reporting System (FAERS), this paper proposes PFed-Signal, a privacy-preserving federated learning framework. First, biased samples are identified and removed using Euclidean distance-based outlier detection to construct clean local subsets. Subsequently, a Transformer-based model is employed for high-accuracy ADR signal prediction. Our key innovations include: (i) a federated bias-detection mechanism that operates without sharing raw data; (ii) Pfed-Split, a novel data partitioning strategy tailored for heterogeneous, biased clinical reports; and (iii) an ADR-signal dual-module architecture integrating bias mitigation and signal detection. Experiments demonstrate that PFed-Signal significantly outperforms baseline methods in standard signal detection metrics (ROR and PRR), achieving superior performance across accuracy (0.887), F1-score (0.890), recall (0.913), and AUC (0.957). This work establishes a new paradigm for trustworthy, privacy-aware pharmacovigilance.
📝 Abstract
The adverse drug reactions (ADRs) predicted based on the biased records in FAERS (U.S. Food and Drug Administration Adverse Event Reporting System) may mislead diagnosis online. Generally, such problems are solved by optimizing reporting odds ratio (ROR) or proportional reporting ratio (PRR). However, these methods that rely on statistical methods cannot eliminate the biased data, leading to inaccurate signal prediction. In this paper, we propose PFed-signal, a federated learning-based signal prediction model of ADR, which utilizes the Euclidean distance to eliminate the biased data from FAERS, thereby improving the accuracy of ADR prediction. Specifically, we first propose Pfed-Split, a method to split the original dataset into a split dataset based on ADR. Then we propose ADR-signal, an ADR prediction model, including a biased data identification method based on federated learning and an ADR prediction model based on Transformer. The former identifies the biased data according to the Euclidean distance and generates a clean dataset by deleting the biased data. The latter is an ADR prediction model based on Transformer trained on the clean data set. The results show that the ROR and PRR on the clean dataset are better than those of the traditional methods. Furthermore, the accuracy rate, F1 score, recall rate and AUC of PFed-Signal are 0.887, 0.890, 0.913 and 0.957 respectively, which are higher than the baselines.