🤖 AI Summary
This study addresses significant privacy risks in online promotional content detection pipelines—particularly during data collection, feature extraction, and model inference—which commonly fail to comply with regulations such as GDPR and CCPA. Through a systematic review of 162 publications, the authors propose PROMPT, a novel framework that formally maps privacy risks to corresponding defense strategies for the first time. The framework incorporates a compliance scoring mechanism and a utility function to quantitatively balance privacy preservation, model performance, and deployment cost. Experiments fine-tuning a Transformer-based encoder-decoder architecture under synthetic perturbations show that at perturbation parameter q=0.05, F₁ scores decline by only 1–2%, whereas at q=0.20, the drop reaches 13–14%. The analysis further reveals widespread compliance deficiencies in existing approaches regarding metadata handling and user-level aggregation.
📝 Abstract
Online propaganda detection pipelines expose measurable privacy risks at multiple stages including data collection, feature extraction, and model inference. We conduct a structured analysis of $162$ peer-reviewed studies and formalize the problem using the Propaganda Risk Online Mitigation and Privacy-preserving Tactics (PROMPT) framework. PROMPT models risks $R$ and mitigation strategies $S$ through a mapping $M: R\to S$ guided by a utility function $α\cdot \mathrm{PrivacyGain}(s_j) - β\cdot \mathrm{PerfLoss}(s_j) - γ\cdot \mathrm{Cost}(s_j)$, with tunable $(α,β,γ)$ enabling stakeholders to balance privacy, accuracy, and deployment costs. To assess practical adoption, we introduce a compliance score that quantifies the alignment of existing methods with GDPR, CCPA etc. requirements. Our evaluation shows that many widely used pipelines remain non-compliant, particularly in metadata handling and user-level aggregation. We further present empirical fine-tuning experiments on transformer-based encoders and decoders under synthetic perturbation, demonstrating a monotonic privacy-utility trade-off: with $q = 0.05$ performance decreased by 1-2% F$_1$, while at $q = 0.20$ the reduction reached 13-14%. These results establish quantitative baselines for privacy costs in propaganda detection. Our contributions include a formal risk-to-defense mapping, a compliance-oriented auditing metric, and experimental evidence of privacy-performance trade-offs, providing a technical foundation for building regulation-compliant and privacy-aware detection systems.