🤖 AI Summary
This work addresses the challenge of accurately and efficiently reconstructing original text sequences from given embedding vectors without relying on a target encoder. To this end, the authors propose a novel inversion framework based on conditional masked diffusion, which reformulates the task as a parallel denoising process, departing from conventional autoregressive generation paradigms. The method innovatively integrates masked diffusion mechanisms with conditional normalization, enabling high-fidelity reconstruction in just eight forward passes. Experimental results demonstrate that, on 32-token sequences, the approach achieves up to 81.3% token-level accuracy across three mainstream embedding models, substantially advancing both the efficiency and effectiveness of embedding inversion.
📝 Abstract
We frame embedding inversion as conditional masked diffusion, recovering all tokens in parallel through iterative denoising rather than sequential autoregressive generation. A masked diffusion language model is conditioned on the target embedding via adaptive layer normalization, requiring only 8 forward passes through a 78M parameter model with no access to the target encoder. On 32-token sequences across three embedding models, the method achieves up to 81.3% token accuracy. Source code and live demo are available at https://github.com/jina-ai/embedding-inversion-demo.