A Large-Scale Privacy Assessment of Android Third-Party SDKs

📅 2024-09-16
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses widespread privacy policy misrepresentation and unauthorized data exfiltration in Android third-party SDKs. We propose the first synergistic evaluation framework integrating large language model (LLM)-driven semantic parsing of privacy policies with static taint analysis of SDK code. Auditing 158 mainstream SDKs, we find that 31% lack publicly available privacy policies, and 88% falsely claim not to collect sensitive data. Cross-platform behavioral comparison and precise taint tracking identify 338 real-world data leakage incidents. Longitudinal analysis reveals no improvement in compliance one year later. Our key contributions are threefold: (1) the first use of LLMs to formally verify consistency between policy text and actual code behavior; (2) empirical evidence exposing systemic policy-code divergence; and (3) three actionable, evidence-based recommendations for regulatory oversight and engineering mitigation. This work establishes a novel paradigm for rigorous, scalable privacy compliance assessment.

Technology Category

Application Category

📝 Abstract
Third-party Software Development Kits (SDKs) are widely adopted in Android app development, to effortlessly accelerate development pipelines and enhance app functionality. However, this convenience raises substantial concerns about unauthorized access to users' privacy-sensitive information, which could be further abused for illegitimate purposes like user tracking or monetization. Our study offers a targeted analysis of user privacy protection among Android third-party SDKs, filling a critical gap in the Android software supply chain. It focuses on two aspects of their privacy practices, including data exfiltration and behavior-policy compliance (or privacy compliance), utilizing techniques of taint analysis and large language models. It covers 158 widely-used SDKs from two key SDK release platforms, the official one and a large alternative one. From them, we identified 338 instances of privacy data exfiltration. On the privacy compliance, our study reveals that more than 30% of the examined SDKs fail to provide a privacy policy to disclose their data handling practices. Among those that provide privacy policies, 37% of them over-collect user data, and 88% falsely claim access to sensitive data. We revisit the latest versions of the SDKs after 12 months. Our analysis demonstrates a persistent lack of improvement in these concerning trends. Based on our findings, we propose three actionable recommendations to mitigate the privacy leakage risks and enhance privacy protection for Android users. Our research not only serves as an urgent call for industry attention but also provides crucial insights for future regulatory interventions.
Problem

Research questions and friction points this paper is trying to address.

Analyzing privacy compliance of Android third-party SDKs
Identifying unauthorized data exfiltration by SDKs
Assessing discrepancies in SDK privacy policy claims
Innovation

Methods, ideas, or system contributions that make the work stand out.

Taint analysis for privacy data tracking
Large language models for compliance checks
Analysis of 158 SDKs from major platforms
🔎 Similar Papers
No similar papers found.
M
M. H. Meng
Institute for Infocomm Research, Singapore
C
Chuan Yan
The University of Queensland, Australia
Yun Hao
Yun Hao
National University of Singapore, Singapore
Q
Qing Zhang
ByteDance, China
Z
Zeyu Wang
ByteDance, China
K
Kailong Wang
Huazhong University of Science and Technology, China
S
S. G. Teo
Institute for Infocomm Research, Singapore
Guangdong Bai
Guangdong Bai
Associate Professor of The University of Queensland
System SecuritySoftware SecurityTrustworthy AIPrivacy Compliance
Jin Song Dong
Jin Song Dong
Professor of Computer Science, National University of Singapore
Formal MethodsTrusted AISafe AIModel CheckingSports Analytics