Privacy Checklist: Privacy Violation Detection Grounding on Contextual Integrity Theory

📅 2024-08-19

🏛️ arXiv.org

📈 Citations: 1

✨ Influential: 0

career value

182K/year

🤖 AI Summary

Existing privacy research is fragmented across isolated technical domains (e.g., CV, NLP, networking), failing to address real-world, cross-contextual privacy concerns. Method: We propose a human-centered privacy modeling framework grounded in Contextual Integrity (CI) theory—first systematically integrating CI into large language models (LLMs). Our approach constructs the first multi-ontology privacy checklist incorporating social identities, sensitive attributes, and the full HIPAA regulatory framework. Leveraging expert-annotated, multi-source ontology fusion, it extends beyond traditional PII definitions to enable context-aware, human-interpretable privacy assessment. Contribution/Results: Experiments demonstrate LLMs’ efficacy in structured regulatory comprehension and context-sensitive privacy reasoning. Our framework establishes a novel paradigm for generalizable, cross-domain privacy risk identification—bridging theoretical privacy principles with scalable, deployable AI-driven assessment.

Technology Category

Application Category

📝 Abstract

Privacy research has attracted wide attention as individuals worry that their private data can be easily leaked during interactions with smart devices, social platforms, and AI applications. Computer science researchers, on the other hand, commonly study privacy issues through privacy attacks and defenses on segmented fields. Privacy research is conducted on various sub-fields, including Computer Vision (CV), Natural Language Processing (NLP), and Computer Networks. Within each field, privacy has its own formulation. Though pioneering works on attacks and defenses reveal sensitive privacy issues, they are narrowly trapped and cannot fully cover people's actual privacy concerns. Consequently, the research on general and human-centric privacy research remains rather unexplored. In this paper, we formulate the privacy issue as a reasoning problem rather than simple pattern matching. We ground on the Contextual Integrity (CI) theory which posits that people's perceptions of privacy are highly correlated with the corresponding social context. Based on such an assumption, we develop the first comprehensive checklist that covers social identities, private attributes, and existing privacy regulations. Unlike prior works on CI that either cover limited expert annotated norms or model incomplete social context, our proposed privacy checklist uses the whole Health Insurance Portability and Accountability Act of 1996 (HIPAA) as an example, to show that we can resort to large language models (LLMs) to completely cover the HIPAA's regulations. Additionally, our checklist also gathers expert annotations across multiple ontologies to determine private information including but not limited to personally identifiable information (PII). We use our preliminary results on the HIPAA to shed light on future context-centric privacy research to cover more privacy regulations, social norms and standards.

Problem

Research questions and friction points this paper is trying to address.

Detects privacy violations using Contextual Integrity theory.

Develops a checklist covering social identities and regulations.

Utilizes large language models for comprehensive privacy analysis.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Contextual Integrity Theory application

Comprehensive privacy checklist development

Large language models for HIPAA coverage

🔎 Similar Papers

No similar papers found.