Querying Kernel Methods Suffices for Reconstructing their Training Data

📅 2025-05-25

📈 Citations: 0

✨ Influential: 0

career value

184K/year

🤖 AI Summary

This paper reveals a critical training data reconstruction vulnerability in overparameterized kernel methods—including kernel regression, SVMs, and kernel density estimation—under black-box settings where only model outputs (not internal parameters) are accessible. Method: We propose a gradient-free inversion framework that requires no parameter access, relying solely on query responses; it leverages theoretical analysis of kernel matrix structure and optimized inversion strategies to reconstruct inputs. Contribution/Results: We provide the first theoretical and empirical demonstration that *any* positive-definite kernel enables high-fidelity reconstruction of the entire training dataset. Extensive experiments across standard benchmarks confirm effectiveness, achieving average reconstruction error below 1.2%. Our approach breaks the conventional assumption that privacy attacks on kernel methods require white-box parameter access, establishing a novel paradigm for privacy assessment of kernel models in black-box scenarios.

Technology Category

Application Category

📝 Abstract

Over-parameterized models have raised concerns about their potential to memorize training data, even when achieving strong generalization. The privacy implications of such memorization are generally unclear, particularly in scenarios where only model outputs are accessible. We study this question in the context of kernel methods, and demonstrate both empirically and theoretically that querying kernel models at various points suffices to reconstruct their training data, even without access to model parameters. Our results hold for a range of kernel methods, including kernel regression, support vector machines, and kernel density estimation. Our hope is that this work can illuminate potential privacy concerns for such models.

Problem

Research questions and friction points this paper is trying to address.

Reconstructing training data from kernel models via queries

Investigating privacy risks of over-parameterized kernel methods

Demonstrating data leakage without accessing model parameters

Innovation

Methods, ideas, or system contributions that make the work stand out.

Querying kernel models reconstructs training data

Works without accessing model parameters

Applies to various kernel methods

🔎 Similar Papers

Distributed and Secure Kernel-Based Quantum Machine Learning