🤖 AI Summary
This paper reveals a critical training data reconstruction vulnerability in overparameterized kernel methods—including kernel regression, SVMs, and kernel density estimation—under black-box settings where only model outputs (not internal parameters) are accessible.
Method: We propose a gradient-free inversion framework that requires no parameter access, relying solely on query responses; it leverages theoretical analysis of kernel matrix structure and optimized inversion strategies to reconstruct inputs.
Contribution/Results: We provide the first theoretical and empirical demonstration that *any* positive-definite kernel enables high-fidelity reconstruction of the entire training dataset. Extensive experiments across standard benchmarks confirm effectiveness, achieving average reconstruction error below 1.2%. Our approach breaks the conventional assumption that privacy attacks on kernel methods require white-box parameter access, establishing a novel paradigm for privacy assessment of kernel models in black-box scenarios.
📝 Abstract
Over-parameterized models have raised concerns about their potential to memorize training data, even when achieving strong generalization. The privacy implications of such memorization are generally unclear, particularly in scenarios where only model outputs are accessible. We study this question in the context of kernel methods, and demonstrate both empirically and theoretically that querying kernel models at various points suffices to reconstruct their training data, even without access to model parameters. Our results hold for a range of kernel methods, including kernel regression, support vector machines, and kernel density estimation. Our hope is that this work can illuminate potential privacy concerns for such models.