Zero2Text: Zero-Training Cross-Domain Inversion Attacks on Textual Embeddings

πŸ“… 2026-02-02
πŸ“ˆ Citations: 1
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This work addresses the challenge of embedding inversion under strict black-box and cross-domain settings, where existing methods struggle to preserve vector database privacy due to their reliance on extensive queries or in-domain training data. We propose the first training-free cross-domain embedding inversion framework, which leverages a recursive online alignment mechanism that integrates large language model priors with dynamic ridge regression to generate text aligned with target embeddings in real timeβ€”without any training or prior knowledge of the target domain. Our approach eliminates dependence on static datasets or high query budgets and exposes limitations of conventional defenses such as differential privacy. Experiments demonstrate significant improvements over baselines across multiple benchmarks including MS MARCO, achieving a 1.8Γ— increase in ROUGE-L and a 6.4Γ— gain in BLEU-2 against OpenAI models, successfully reconstructing original sentences from unseen domains.

Technology Category

Application Category

πŸ“ Abstract
The proliferation of retrieval-augmented generation (RAG) has established vector databases as critical infrastructure, yet they introduce severe privacy risks via embedding inversion attacks. Existing paradigms face a fundamental trade-off: optimization-based methods require computationally prohibitive queries, while alignment-based approaches hinge on the unrealistic assumption of accessible in-domain training data. These constraints render them ineffective in strict black-box and cross-domain settings. To dismantle these barriers, we introduce Zero2Text, a novel training-free framework based on recursive online alignment. Unlike methods relying on static datasets, Zero2Text synergizes LLM priors with a dynamic ridge regression mechanism to iteratively align generation to the target embedding on-the-fly. We further demonstrate that standard defenses, such as differential privacy, fail to effectively mitigate this adaptive threat. Extensive experiments across diverse benchmarks validate Zero2Text; notably, on MS MARCO against the OpenAI victim model, it achieves 1.8x higher ROUGE-L and 6.4x higher BLEU-2 scores compared to baselines, recovering sentences from unknown domains without a single leaked data pair.
Problem

Research questions and friction points this paper is trying to address.

embedding inversion
cross-domain
black-box attack
privacy risk
retrieval-augmented generation
Innovation

Methods, ideas, or system contributions that make the work stand out.

zero-training
cross-domain inversion
recursive online alignment
embedding inversion attack
training-free framework
πŸ”Ž Similar Papers
No similar papers found.
D
Doohyun Kim
Graduate School of Information Security, Korea Advanced Institute of Science and Technology (KAIST), Daejeon, Republic of Korea
Donghwa Kang
Donghwa Kang
KAIST
DNNReal Time SystemSNNAI Security
Kyungjae Lee
Kyungjae Lee
University of Seoul
Natural Language ProcessingLanguage ModelQuestion Answering
H
Hyeongboo Baek
Department of Artificial Intelligence, University of Seoul, Seoul, Republic of Korea
B
Brent Byunghoon Kang
School of Computing, Korea Advanced Institute of Science and Technology (KAIST), Daejeon, Republic of Korea