Efficient Techniques for Data Reconstruction, with Finite-Width Recovery Guarantees

📅 2026-05-07

📈 Citations: 0

✨ Influential: 0

career value

196K/year

🤖 AI Summary

Trained neural networks are vulnerable to data reconstruction attacks that can leak sensitive training data. This work proposes a subspace-aware unified optimization framework that estimates the low-dimensional subspace containing the original inputs by analyzing weight changes in the first layer, then efficiently reconstructs inputs using only the final-layer weights within this reduced subspace, thereby drastically lowering the search dimensionality. Theoretically, it provides the first PAC-style guarantee for high-probability data recovery in finite-width networks, reducing the width dependence from the ambient input dimension to the intrinsic subspace dimension. Experiments on both synthetic data and CIFAR-10 demonstrate that the proposed method significantly outperforms conventional full-space reconstruction techniques, achieving high-fidelity recovery of original inputs.

📝 Abstract

Data reconstruction attacks on trained neural networks aim to recover the data on which the network has been trained and pose a significant threat to privacy, especially if the training dataset contains sensitive information. Here, we propose a unified optimization formulation of the data reconstruction problem based on initial and trained parameter values, incorporating state-of-the-art proposals. We show that in the random feature model, this formulation provably leads to training data reconstruction with high probability, provided the network width is sufficiently large; this unprecedented finite-width result uses PAC-style bounds. Furthermore, when the data lies in a low-dimensional subspace, we show that the network width requirement for successful reconstruction can be relaxed, with bounds depending on the subspace dimension rather than the ambient dimension. For general neural network models and unknown data orientations, we propose an efficient reconstruction algorithm that approximates the low-dimensional data subspace through the change in the first-layer weights during training and uses only the last-layer weights for reconstruction, thus reducing the search space dimension and the required network width for high-quality reconstructions. Our numerical experiments on synthetic datasets and CIFAR-10 confirm that our subspace-aware reconstruction approach outperforms standard full-space techniques.

Problem

Research questions and friction points this paper is trying to address.

data reconstruction

privacy

neural networks

finite-width guarantees

low-dimensional subspace

Innovation

Methods, ideas, or system contributions that make the work stand out.

data reconstruction

finite-width guarantees

low-dimensional subspace