Which Code Statements Implement Privacy Behaviors in Android Applications?

📅 2025-03-03
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the limitation of existing Android privacy analysis approaches, which operate at coarse-grained levels (e.g., components or APIs) and lack statement-level precision. To bridge this gap, we conduct an empirical user study with professional developers to identify statements widely recognized as privacy-relevant. Based on these insights, we construct the first fine-grained, statement-level privacy code annotation dataset. We then propose a privacy statement detection model built upon fine-tuned large language models (LLaMA-2, CodeLlama, and StarCoder). Key contributions include: (1) the first empirical finding that expression statements—particularly function calls—are the primary carriers of privacy behavior; (2) model–human agreement reaching κ = 0.82, matching or exceeding inter-annotator agreement (κ = 0.79); and (3) substantial improvements in accuracy and interpretability for privacy policy generation and regulatory compliance auditing.

Technology Category

Application Category

📝 Abstract
A"privacy behavior"in software is an action where the software uses personal information for a service or a feature, such as a website using location to provide content relevant to a user. Programmers are required by regulations or application stores to provide privacy notices and labels describing these privacy behaviors. Although many tools and research prototypes have been developed to help programmers generate these notices by analyzing the source code, these approaches are often fairly coarse-grained (i.e., at the level of whole methods or files, rather than at the statement level). But this is not necessarily how privacy behaviors exist in code. Privacy behaviors are embedded in specific statements in code. Current literature does not examine what statements programmers see as most important, how consistent these views are, or how to detect them. In this paper, we conduct an empirical study to examine which statements programmers view as most-related to privacy behaviors. We find that expression statements that make function calls are most associated with privacy behaviors, while the type of privacy label has little effect on the attributes of the selected statements. We then propose an approach to automatically detect these privacy-relevant statements by fine-tuning three large language models with the data from the study. We observe that the agreement between our approach and participants is comparable to or higher than an agreement between two participants. Our study and detection approach can help programmers understand which statements in code affect privacy in mobile applications.
Problem

Research questions and friction points this paper is trying to address.

Identify code statements implementing privacy behaviors in Android apps.
Evaluate programmer perceptions of privacy-relevant statements in code.
Develop automated detection of privacy-relevant statements using language models.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Fine-tuned large language models for detection
Empirical study on privacy-relevant code statements
Automated detection of privacy behaviors in code
🔎 Similar Papers
No similar papers found.