🤖 AI Summary
This study investigates how data users comprehend and respond to statistical noise introduced by privacy-preserving techniques—particularly differential privacy (DP)—focusing on cognitive challenges in constructing confidence intervals and navigating utility–privacy trade-offs. Through scenario-based interviews, we observed users analyzing Wikipedia browsing data perturbed by DP and rounding mechanisms. Results indicate that users prefer simple uncertainty metrics but struggle to robustly construct confidence intervals under compound noise sources; DP-perturbed data, compared to rounded data, more frequently elicits simulation-based inferential practices. Based on these findings, we propose a simulation-driven paradigm for uncertainty assessment and provide empirically grounded, actionable recommendations for privacy-aware data documentation and the design of interactive analytical tools.
📝 Abstract
In response to calls for open data and growing privacy threats, organizations are increasingly adopting privacy-preserving techniques such as differential privacy (DP) that inject statistical noise when generating published datasets. These techniques are designed to protect privacy of data subjects while enabling useful analyses, but their reception by data users is under-explored. We developed documentation that presents the noise characteristics of two Wikipedia pageview datasets: one using rounding (heuristic privacy) and another using DP (formal privacy). After incorporating expert feedback (n=5), we used these documents to conduct a task-based contextual inquiry (n=15) exploring how data users--largely unfamiliar with these methods--perceive, interact with, and interpret privacy-preserving noise during data analysis.
Participants readily used simple uncertainty metrics from the documentation, but struggled when asked to compute confidence intervals across multiple noisy estimates. They were better able to devise simulation-based approaches for computing uncertainty with DP data compared to rounded data. Surprisingly, several participants incorrectly believed DP's stronger utility implied weaker privacy protections. Based on our findings, we offer design recommendations for documentation and tools to better support data users working with privacy-noised data.