🤖 AI Summary
This work investigates reverse-engineering attacks against obfuscated embeddings in the black-box setting: given only obfuscated embedding vectors and a publicly available embedding table—without access to the underlying language model or obfuscation mechanism—the goal is to reconstruct the original token sequence. We propose a language-aware joint estimation framework that, for the first time, unifies language model priors (e.g., n-gram statistics or lightweight LMs) with noise parameter modeling (Laplacian/Gaussian) within a Bayesian inference framework, coupled with beam search decoding for embedding reconstruction. Compared to naive distance-based baselines, our method achieves significantly higher token recovery accuracy. Our results expose a fundamental vulnerability of input-agnostic, fixed-noise obfuscation mechanisms for embedding-level privacy protection, demonstrating their insufficiency against informed adversaries. We thus argue for the necessity of input-dependent, learnable obfuscation strategies that adapt to linguistic context and semantic structure.
📝 Abstract
In this work, we consider an inversion attack on the obfuscated input embeddings sent to a language model on a server, where the adversary has no access to the language model or the obfuscation mechanism and sees only the obfuscated embeddings along with the model's embedding table. We propose BeamClean, an inversion attack that jointly estimates the noise parameters and decodes token sequences by integrating a language-model prior. Against Laplacian and Gaussian obfuscation mechanisms, BeamClean always surpasses naive distance-based attacks. This work highlights the necessity for and robustness of more advanced learned, input-dependent methods.