A Privacy-Preserving Data Collection Method for Diversified Statistical Analysis

📅 2025-07-23

📈 Citations: 0

✨ Influential: 0

career value

256K/year

🤖 AI Summary

Existing privacy-preserving methods primarily target discrete data or single-point statistics, struggling to simultaneously preserve distributional fidelity and support diverse analytical tasks for real-valued sensitive information. To address this, we propose RVNS—the first Negative Survey (NS) framework tailored for real-valued data: users perturb their true values by sampling only within a local neighborhood, avoiding discretization entirely; we extend NS to the continuous domain for the first time, establishing an end-to-end distribution protection framework that jointly optimizes individual privacy and aggregate data utility under differential privacy. We formally prove RVNS satisfies strict ε-differential privacy. Experiments on multiple real and synthetic datasets demonstrate that RVNS significantly outperforms conventional perturbation mechanisms—enabling more accurate reconstruction of the underlying distribution and supporting diverse statistical analyses, including mean, variance, quantile estimation, and machine learning tasks.

Technology Category

Application Category

📝 Abstract

Data perturbation-based privacy-preserving methods have been widely adopted in various scenarios due to their efficiency and the elimination of the need for a trusted third party. However, these methods primarily focus on individual statistical indicators, neglecting the overall quality of the collected data from a distributional perspective. Consequently, they often fall short of meeting the diverse statistical analysis requirements encountered in practical data analysis. As a promising sensitive data perturbation method, negative survey methods is able to complete the task of collecting sensitive information distribution while protecting personal privacy. Yet, existing negative survey methods are primarily designed for discrete sensitive information and are inadequate for real-valued data distributions. To bridge this gap, this paper proposes a novel real-value negative survey model, termed RVNS, for the first time in the field of real-value sensitive information collection. The RVNS model exempts users from the necessity of discretizing their data and only requires them to sample a set of data from a range that deviates from their actual sensitive details, thereby preserving the privacy of their genuine information. Moreover, to accurately capture the distribution of sensitive information, an optimization problem is formulated, and a novel approach is employed to solve it. Rigorous theoretical analysis demonstrates that the RVNS model conforms to the differential privacy model, ensuring robust privacy preservation. Comprehensive experiments conducted on both synthetic and real-world datasets further validate the efficacy of the proposed method.

Problem

Research questions and friction points this paper is trying to address.

Addresses limitations of data perturbation methods for diverse statistical analysis

Proposes RVNS model for real-value sensitive data collection without discretization

Ensures privacy preservation and accurate distribution capture via differential privacy

Innovation

Methods, ideas, or system contributions that make the work stand out.

Proposes real-value negative survey model RVNS

Formulates optimization problem for distribution capture

Ensures differential privacy with theoretical analysis

🔎 Similar Papers

A Survey on Federated Analytics: Taxonomy, Enabling Techniques, Applications and Open Issues