A Privacy-Preserving Data Collection Method for Diversified Statistical Analysis

📅 2025-07-23
📈 Citations: 0
Influential: 0
📄 PDF

career value

247K/year
🤖 AI Summary
Existing privacy-preserving methods primarily target discrete data or single-point statistics, struggling to simultaneously preserve distributional fidelity and support diverse analytical tasks for real-valued sensitive information. To address this, we propose RVNS—the first Negative Survey (NS) framework tailored for real-valued data: users perturb their true values by sampling only within a local neighborhood, avoiding discretization entirely; we extend NS to the continuous domain for the first time, establishing an end-to-end distribution protection framework that jointly optimizes individual privacy and aggregate data utility under differential privacy. We formally prove RVNS satisfies strict ε-differential privacy. Experiments on multiple real and synthetic datasets demonstrate that RVNS significantly outperforms conventional perturbation mechanisms—enabling more accurate reconstruction of the underlying distribution and supporting diverse statistical analyses, including mean, variance, quantile estimation, and machine learning tasks.

Technology Category

Application Category

📝 Abstract
Data perturbation-based privacy-preserving methods have been widely adopted in various scenarios due to their efficiency and the elimination of the need for a trusted third party. However, these methods primarily focus on individual statistical indicators, neglecting the overall quality of the collected data from a distributional perspective. Consequently, they often fall short of meeting the diverse statistical analysis requirements encountered in practical data analysis. As a promising sensitive data perturbation method, negative survey methods is able to complete the task of collecting sensitive information distribution while protecting personal privacy. Yet, existing negative survey methods are primarily designed for discrete sensitive information and are inadequate for real-valued data distributions. To bridge this gap, this paper proposes a novel real-value negative survey model, termed RVNS, for the first time in the field of real-value sensitive information collection. The RVNS model exempts users from the necessity of discretizing their data and only requires them to sample a set of data from a range that deviates from their actual sensitive details, thereby preserving the privacy of their genuine information. Moreover, to accurately capture the distribution of sensitive information, an optimization problem is formulated, and a novel approach is employed to solve it. Rigorous theoretical analysis demonstrates that the RVNS model conforms to the differential privacy model, ensuring robust privacy preservation. Comprehensive experiments conducted on both synthetic and real-world datasets further validate the efficacy of the proposed method.
Problem

Research questions and friction points this paper is trying to address.

Addresses limitations of data perturbation methods for diverse statistical analysis
Proposes RVNS model for real-value sensitive data collection without discretization
Ensures privacy preservation and accurate distribution capture via differential privacy
Innovation

Methods, ideas, or system contributions that make the work stand out.

Proposes real-value negative survey model RVNS
Formulates optimization problem for distribution capture
Ensures differential privacy with theoretical analysis
H
Hao Jiang
Key Laboratory of Intelligent Computing and Signal Processing of the Ministry of Education, School of Computer Science and Technology, Anhui University, Hefei 230601, China
Q
Quan Zhou
Key Laboratory of Intelligent Computing and Signal Processing of the Ministry of Education, School of Artificial Intelligence, Anhui University, Hefei 230601, China
Dongdong Zhao
Dongdong Zhao
Wuhan University of Technology
Biometrics SecurityPrivacy-preserving Deep LearningArtificial Intelligence Security
S
Shangshang Yang
Key Laboratory of Intelligent Computing and Signal Processing of the Ministry of Education, School of Artificial Intelligence, Anhui University, Hefei 230601, China
Wenjian Luo
Wenjian Luo
Professor, School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen
AI and SecurityIntelligent SecuritySecure IntelligencePrivacy ComputationImmune Computation
Xingyi Zhang
Xingyi Zhang
MBZUAI
graph representation learningAI4Sciencegeometric deep learning