Rep2Text: Decoding Full Text from a Single LLM Token Representation

📅 2025-11-09

📈 Citations: 0

✨ Influential: 0

career value

200K/year

🤖 AI Summary

This study investigates the extent of original input information preserved in the final-layer token representation of large language models (LLMs). To quantify this, we propose Rep2Text—a lightweight, trainable adapter that maps a single output token into the embedding space of a decoder LLM and reconstructs the input text autoregressively. To our knowledge, this is the first method enabling efficient recovery of multi-token inputs (e.g., 16-token sequences) from a single terminal token, revealing substantial yet underutilized internal redundancy in LLMs and challenging conventional information bottleneck assumptions. Rep2Text supports cross-architecture composition (e.g., using Llama’s representations to drive OPT decoding) and maintains high semantic fidelity and textual coherence on both in-distribution and out-of-distribution medical texts. Empirical results show an average input information recovery rate exceeding 50%.

Technology Category

Application Category

📝 Abstract

Large language models (LLMs) have achieved remarkable progress across diverse tasks, yet their internal mechanisms remain largely opaque. In this work, we address a fundamental question: to what extent can the original input text be recovered from a single last-token representation within an LLM? We propose Rep2Text, a novel framework for decoding full text from last-token representations. Rep2Text employs a trainable adapter that projects a target model's internal representations into the embedding space of a decoding language model, which then autoregressively reconstructs the input text. Experiments on various model combinations (Llama-3.1-8B, Gemma-7B, Mistral-7B-v0.1, Llama-3.2-3B) demonstrate that, on average, over half of the information in 16-token sequences can be recovered from this compressed representation while maintaining strong semantic integrity and coherence. Furthermore, our analysis reveals an information bottleneck effect: longer sequences exhibit decreased token-level recovery while preserving strong semantic integrity. Besides, our framework also demonstrates robust generalization to out-of-distribution medical data.

Problem

Research questions and friction points this paper is trying to address.

Recovering original input text from single LLM token representations

Decoding full text sequences using compressed last-token embeddings

Addressing information bottleneck in LLM internal representation compression

Innovation

Methods, ideas, or system contributions that make the work stand out.

Decodes full text from single token representations

Projects internal representations via trainable adapter

Autoregressively reconstructs input with decoding language model

🔎 Similar Papers

STORE: Streamlining Semantic Tokenization and Generative Recommendation with A Single LLM