Language Models are Injective and Hence Invertible

📅 2025-10-17

📈 Citations: 0

✨ Influential: 0

career value

168K/year

🤖 AI Summary

This work investigates the injectivity of Transformer language models—whether discrete input sequences are losslessly mapped to unique continuous hidden representations. Through rigorous mathematical proof and extensive empirical validation (billions of collision tests), we establish, for the first time, that mainstream large language models preserve injectivity both at initialization and after training: the mapping from inputs to hidden activations is bijectively invertible. Building on this theoretical foundation, we propose SipIt—an algorithm that provably reconstructs the original input in linear time with exact fidelity. Evaluated across six major LLMs, SipIt achieves 100% text reconstruction without any observed representation collisions. Our results provide the first formal guarantee of input recoverability in Transformers, establishing a rigorous theoretical basis for model transparency, interpretability, and internal representation analysis, while delivering a practical, scalable tool for probing latent-space structure.

Technology Category

Application Category

📝 Abstract

Transformer components such as non-linear activations and normalization are inherently non-injective, suggesting that different inputs could map to the same output and prevent exact recovery of the input from a model's representations. In this paper, we challenge this view. First, we prove mathematically that transformer language models mapping discrete input sequences to their corresponding sequence of continuous representations are injective and therefore lossless, a property established at initialization and preserved during training. Second, we confirm this result empirically through billions of collision tests on six state-of-the-art language models, and observe no collisions. Third, we operationalize injectivity: we introduce SipIt, the first algorithm that provably and efficiently reconstructs the exact input text from hidden activations, establishing linear-time guarantees and demonstrating exact invertibility in practice. Overall, our work establishes injectivity as a fundamental and exploitable property of language models, with direct implications for transparency, interpretability, and safe deployment.

Problem

Research questions and friction points this paper is trying to address.

Proving transformer language models are injective and lossless

Empirically confirming no collisions in language model representations

Developing efficient algorithm to reconstruct input from activations

Innovation

Methods, ideas, or system contributions that make the work stand out.

Proves transformer language models are injective and lossless

Introduces SipIt algorithm for exact input reconstruction

Establishes linear-time guarantees for invertibility in practice

🔎 Similar Papers

No similar papers found.