On Reconstructing Training Data From Bayesian Posteriors and Trained Models

📅 2025-07-24

📈 Citations: 0

✨ Influential: 0

career value

233K/year

🤖 AI Summary

This work exposes a critical privacy risk: Bayesian posterior distributions and trained model parameters may inadvertently leak original training data. Addressing the lack of systematic analysis on Bayesian model reconstruction attacks, we propose the first unified score-matching framework capable of reconstructing training data from both Bayesian and non-Bayesian models. We further introduce the first theoretical characterization of attack-vulnerable data—leveraging the equivalence between maximum mean discrepancy (MMD) and kernel Stein discrepancy—to derive a mathematical criterion for training data recoverability. Through rigorous theoretical modeling and adversarial experiments, we quantitatively delineate the privacy boundary between model parameters and training data, precisely identifying the dimensions of information that remain reconstructible. Our results establish a novel paradigm for understanding data leakage mechanisms in machine learning and provide foundational theory for designing privacy-preserving defenses, including differential privacy and secure model publishing.

Technology Category

Application Category

📝 Abstract

Publicly releasing the specification of a model with its trained parameters means an adversary can attempt to reconstruct information about the training data via training data reconstruction attacks, a major vulnerability of modern machine learning methods. This paper makes three primary contributions: establishing a mathematical framework to express the problem, characterising the features of the training data that are vulnerable via a maximum mean discrepancy equivalance and outlining a score matching framework for reconstructing data in both Bayesian and non-Bayesian models, the former is a first in the literature.

Problem

Research questions and friction points this paper is trying to address.

Analyzing training data reconstruction attacks in machine learning

Characterizing vulnerable training data features via maximum mean discrepancy

Developing score matching for data reconstruction in Bayesian models

Innovation

Methods, ideas, or system contributions that make the work stand out.

Mathematical framework for training data reconstruction

Maximum mean discrepancy for vulnerable features

Score matching for Bayesian and non-Bayesian models

🔎 Similar Papers

An Efficient Rehearsal Scheme for Catastrophic Forgetting Mitigation during Multi-stage Fine-tuning