SENSOR: An ML-Enhanced Online Annotation Tool to Uncover Privacy Concerns from User Reviews in Social-Media Applications

πŸ“… 2025-07-14
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
To address the low efficiency and coarse granularity in identifying privacy-related issues from massive social media user reviews, this paper proposes a fine-grained privacy intent classification method that automatically categorizes reviews into three mutually exclusive classes: privacy-related feature requests, privacy-related vulnerability reports, and irrelevant commentsβ€”the first such taxonomy. Methodologically, we construct a high-consistency manually annotated dataset of 16,000 samples and design GRACE, an end-to-end model integrating GRU, CBOW word embeddings, and attention mechanisms. Experiments demonstrate state-of-the-art performance with a macro-F1 score of 0.9434 and accuracy of 95.10%, significantly outperforming existing baselines. Our key contributions are: (1) formal definition and modeling of fine-grained privacy intent classification; (2) release of the first high-quality, privacy-intent-specific annotated dataset for mobile applications; and (3) a lightweight, efficient, and interpretable classification framework that enables automated privacy issue triaging for developers.

Technology Category

Application Category

πŸ“ Abstract
The widespread use of social media applications has raised significant privacy concerns, often highlighted in user reviews. These reviews also provide developers with valuable insights into improving apps by addressing issues and introducing better features. However, the sheer volume and nuanced nature of reviews make manual identification and prioritization of privacy-related concerns challenging for developers. Previous studies have developed software utilities to automatically classify user reviews as privacy-relevant, privacy-irrelevant, bug reports, feature requests, etc., using machine learning. Notably, there is a lack of focus on classifying reviews specifically as privacy-related feature requests, privacy-related bug reports, or privacy-irrelevant. This paper introduces SENtinel SORt (SENSOR), an automated online annotation tool designed to help developers annotate and classify user reviews into these categories. For automating the annotation of such reviews, this paper introduces the annotation model, GRACE (GRU-based Attention with CBOW Embedding), using Gated Recurrent Units (GRU) with Continuous Bag of Words (CBOW) and Attention mechanism. Approximately 16000 user reviews from seven popular social media apps on Google Play Store, including Instagram, Facebook, WhatsApp, Snapchat, X (formerly Twitter), Facebook Lite, and Line were analyzed. Two annotators manually labelled the reviews, achieving a Cohen's Kappa value of 0.87, ensuring a labeled dataset with high inter-rater agreement for training machine learning models. Among the models tested, GRACE demonstrated the best performance (macro F1-score: 0.9434, macro ROC-AUC: 0.9934, and accuracy: 95.10%) despite class imbalance. SENSOR demonstrates significant potential to assist developers with extracting and addressing privacy-related feature requests or bug reports from user reviews, enhancing user privacy and trust.
Problem

Research questions and friction points this paper is trying to address.

Classify privacy-related user reviews in social media apps
Automate annotation of privacy feature requests and bug reports
Address challenges in manual review analysis for developers
Innovation

Methods, ideas, or system contributions that make the work stand out.

GRU-based Attention with CBOW Embedding (GRACE)
Automated online annotation tool (SENSOR)
Analyzes 16000 user reviews for privacy concerns
L
Labiba Farah
Software Engineering Lab (SEL), Department of Computer Science and Engineering, Islamic University of Technology (IUT), Boardbazar, Gazipur - 1704, Bangladesh.
Mohammad Ridwan Kabir
Mohammad Ridwan Kabir
Assistant Professor, Department of CSE, Islamic University of Technology (IUT)
Wearable SensorsAssistive TechnologiesRehabilitationHCIMachine Learning.
Shohel Ahmed
Shohel Ahmed
Assistant Professor of CSE, Islamic University of Technology (IUT)
M
MD Mohaymen Ul Anam
Software Engineering Lab (SEL), Department of Computer Science and Engineering, Islamic University of Technology (IUT), Boardbazar, Gazipur - 1704, Bangladesh.
M
Md. Sakibul Islam
Software Engineering Lab (SEL), Department of Computer Science and Engineering, Islamic University of Technology (IUT), Boardbazar, Gazipur - 1704, Bangladesh.