Aim High, Stay Private: Differentially Private Synthetic Data Enables Public Release of Behavioral Health Information with High Utility

📅 2025-06-30
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Balancing privacy preservation and data utility remains a critical challenge in sharing sensitive behavioral health data. Method: We propose a high-utility synthetic data generation framework grounded in differential privacy (DP), integrating physiological signals from Oura rings with self-reported survey data and employing an Adaptive Iterative Mechanism (AIM) to produce ε=5-DP-compliant synthetic datasets. Contribution/Results: We introduce a task-oriented utility evaluation framework that quantitatively characterizes the trade-off between privacy budget and statistical/predictive utility across high-dimensional, large-scale behavioral health data. Experiments demonstrate that the synthetic data retain downstream modeling performance—e.g., depression risk prediction—nearly equivalent to the original data, while substantially reducing re-identification risk. To our knowledge, this is the first empirical validation in a real-world behavioral health setting that strong privacy protection (ε ≤ 5) and high data usability can be simultaneously achieved. Our work provides a reproducible technical pathway and evidence-based guidance for secure, privacy-preserving open sharing of sensitive health data.

Technology Category

Application Category

📝 Abstract
Sharing health and behavioral data raises significant privacy concerns, as conventional de-identification methods are susceptible to privacy attacks. Differential Privacy (DP) provides formal guarantees against re-identification risks, but practical implementation necessitates balancing privacy protection and the utility of data. We demonstrate the use of DP to protect individuals in a real behavioral health study, while making the data publicly available and retaining high utility for downstream users of the data. We use the Adaptive Iterative Mechanism (AIM) to generate DP synthetic data for Phase 1 of the Lived Experiences Measured Using Rings Study (LEMURS). The LEMURS dataset comprises physiological measurements from wearable devices (Oura rings) and self-reported survey data from first-year college students. We evaluate the synthetic datasets across a range of privacy budgets, epsilon = 1 to 100, focusing on the trade-off between privacy and utility. We evaluate the utility of the synthetic data using a framework informed by actual uses of the LEMURS dataset. Our evaluation identifies the trade-off between privacy and utility across synthetic datasets generated with different privacy budgets. We find that synthetic data sets with epsilon = 5 preserve adequate predictive utility while significantly mitigating privacy risks. Our methodology establishes a reproducible framework for evaluating the practical impacts of epsilon on generating private synthetic datasets with numerous attributes and records, contributing to informed decision-making in data sharing practices.
Problem

Research questions and friction points this paper is trying to address.

Balancing privacy and utility in health data sharing
Generating differentially private synthetic behavioral health data
Evaluating privacy-utility trade-offs across varying epsilon values
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses Differential Privacy for synthetic data
Applies Adaptive Iterative Mechanism (AIM)
Balances privacy and utility via epsilon
🔎 Similar Papers
No similar papers found.
M
Mohsen Ghasemizade
Department of Computer Science, University of Vermont
J
Juniper Lovato
Department of Computer Science, University of Vermont
C
Christopher M. Danforth
Department of Mathematics and Statistics, University of Vermont
Peter Sheridan Dodds
Peter Sheridan Dodds
Professor/Director, Computational Story Lab, Vermont Complex Systems Institute, UVM
LanguageMeaningStoriesSociotechnical PhenomenaComplex Systems
L
Laura S. P. Bloomfield
The Gund Institute for the Environment, University of Vermont
Matthew Price
Matthew Price
University of Vermont
PsychologyPTSDTechnologySocial Anxiety
T
Team LEMURS
University of Vermont
Joseph P. Near
Joseph P. Near
University of Vermont
Security & PrivacyDifferential PrivacyProgramming LanguagesFormal MethodsMachine Learning