"Having Confidence in My Confidence Intervals": How Data Users Engage with Privacy-Protected Wikipedia Data

📅 2025-12-06

📈 Citations: 0

✨ Influential: 0

career value

204K/year

🤖 AI Summary

This study investigates how data users comprehend and respond to statistical noise introduced by privacy-preserving techniques—particularly differential privacy (DP)—focusing on cognitive challenges in constructing confidence intervals and navigating utility–privacy trade-offs. Through scenario-based interviews, we observed users analyzing Wikipedia browsing data perturbed by DP and rounding mechanisms. Results indicate that users prefer simple uncertainty metrics but struggle to robustly construct confidence intervals under compound noise sources; DP-perturbed data, compared to rounded data, more frequently elicits simulation-based inferential practices. Based on these findings, we propose a simulation-driven paradigm for uncertainty assessment and provide empirically grounded, actionable recommendations for privacy-aware data documentation and the design of interactive analytical tools.

Technology Category

Application Category

📝 Abstract

In response to calls for open data and growing privacy threats, organizations are increasingly adopting privacy-preserving techniques such as differential privacy (DP) that inject statistical noise when generating published datasets. These techniques are designed to protect privacy of data subjects while enabling useful analyses, but their reception by data users is under-explored. We developed documentation that presents the noise characteristics of two Wikipedia pageview datasets: one using rounding (heuristic privacy) and another using DP (formal privacy). After incorporating expert feedback (n=5), we used these documents to conduct a task-based contextual inquiry (n=15) exploring how data users--largely unfamiliar with these methods--perceive, interact with, and interpret privacy-preserving noise during data analysis. Participants readily used simple uncertainty metrics from the documentation, but struggled when asked to compute confidence intervals across multiple noisy estimates. They were better able to devise simulation-based approaches for computing uncertainty with DP data compared to rounded data. Surprisingly, several participants incorrectly believed DP's stronger utility implied weaker privacy protections. Based on our findings, we offer design recommendations for documentation and tools to better support data users working with privacy-noised data.

Problem

Research questions and friction points this paper is trying to address.

Explores how data users perceive privacy-preserving noise in datasets

Investigates user challenges in computing confidence intervals with noisy data

Examines misconceptions about differential privacy's utility versus privacy trade-offs

Innovation

Methods, ideas, or system contributions that make the work stand out.

Developed documentation explaining noise characteristics

Conducted task-based contextual inquiry with users

Provided design recommendations for privacy-noised data tools

🔎 Similar Papers

No similar papers found.