🤖 AI Summary
This study addresses widespread privacy policy misrepresentation and unauthorized data exfiltration in Android third-party SDKs. We propose the first synergistic evaluation framework integrating large language model (LLM)-driven semantic parsing of privacy policies with static taint analysis of SDK code. Auditing 158 mainstream SDKs, we find that 31% lack publicly available privacy policies, and 88% falsely claim not to collect sensitive data. Cross-platform behavioral comparison and precise taint tracking identify 338 real-world data leakage incidents. Longitudinal analysis reveals no improvement in compliance one year later. Our key contributions are threefold: (1) the first use of LLMs to formally verify consistency between policy text and actual code behavior; (2) empirical evidence exposing systemic policy-code divergence; and (3) three actionable, evidence-based recommendations for regulatory oversight and engineering mitigation. This work establishes a novel paradigm for rigorous, scalable privacy compliance assessment.
📝 Abstract
Third-party Software Development Kits (SDKs) are widely adopted in Android app development, to effortlessly accelerate development pipelines and enhance app functionality. However, this convenience raises substantial concerns about unauthorized access to users' privacy-sensitive information, which could be further abused for illegitimate purposes like user tracking or monetization. Our study offers a targeted analysis of user privacy protection among Android third-party SDKs, filling a critical gap in the Android software supply chain. It focuses on two aspects of their privacy practices, including data exfiltration and behavior-policy compliance (or privacy compliance), utilizing techniques of taint analysis and large language models. It covers 158 widely-used SDKs from two key SDK release platforms, the official one and a large alternative one. From them, we identified 338 instances of privacy data exfiltration. On the privacy compliance, our study reveals that more than 30% of the examined SDKs fail to provide a privacy policy to disclose their data handling practices. Among those that provide privacy policies, 37% of them over-collect user data, and 88% falsely claim access to sensitive data. We revisit the latest versions of the SDKs after 12 months. Our analysis demonstrates a persistent lack of improvement in these concerning trends. Based on our findings, we propose three actionable recommendations to mitigate the privacy leakage risks and enhance privacy protection for Android users. Our research not only serves as an urgent call for industry attention but also provides crucial insights for future regulatory interventions.