Rep2Text: Decoding Full Text from a Single LLM Token Representation

πŸ“… 2025-11-09
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This study investigates the extent of original input information preserved in the final-layer token representation of large language models (LLMs). To quantify this, we propose Rep2Textβ€”a lightweight, trainable adapter that maps a single output token into the embedding space of a decoder LLM and reconstructs the input text autoregressively. To our knowledge, this is the first method enabling efficient recovery of multi-token inputs (e.g., 16-token sequences) from a single terminal token, revealing substantial yet underutilized internal redundancy in LLMs and challenging conventional information bottleneck assumptions. Rep2Text supports cross-architecture composition (e.g., using Llama’s representations to drive OPT decoding) and maintains high semantic fidelity and textual coherence on both in-distribution and out-of-distribution medical texts. Empirical results show an average input information recovery rate exceeding 50%.

Technology Category

Application Category

πŸ“ Abstract
Large language models (LLMs) have achieved remarkable progress across diverse tasks, yet their internal mechanisms remain largely opaque. In this work, we address a fundamental question: to what extent can the original input text be recovered from a single last-token representation within an LLM? We propose Rep2Text, a novel framework for decoding full text from last-token representations. Rep2Text employs a trainable adapter that projects a target model's internal representations into the embedding space of a decoding language model, which then autoregressively reconstructs the input text. Experiments on various model combinations (Llama-3.1-8B, Gemma-7B, Mistral-7B-v0.1, Llama-3.2-3B) demonstrate that, on average, over half of the information in 16-token sequences can be recovered from this compressed representation while maintaining strong semantic integrity and coherence. Furthermore, our analysis reveals an information bottleneck effect: longer sequences exhibit decreased token-level recovery while preserving strong semantic integrity. Besides, our framework also demonstrates robust generalization to out-of-distribution medical data.
Problem

Research questions and friction points this paper is trying to address.

Recovering original input text from single LLM token representations
Decoding full text sequences using compressed last-token embeddings
Addressing information bottleneck in LLM internal representation compression
Innovation

Methods, ideas, or system contributions that make the work stand out.

Decodes full text from single token representations
Projects internal representations via trainable adapter
Autoregressively reconstructs input with decoding language model
πŸ”Ž Similar Papers
No similar papers found.