🤖 AI Summary
This work addresses the frequent misalignment between internal representations and natural language outputs in large language models. To bridge this gap, the authors propose a recurrent Transformer architecture that extends computational depth through multiple iterations of shared layers, functioning as an introspective mechanism. The study presents the first systematic evaluation of such architectures in aligning internal representations with generated language. Findings reveal that while increasing recurrence depth narrows the representation–output gap, part of this improvement stems from degradation of internal knowledge. Moreover, the model exhibits representational awareness only in its final recurrence step, failing to achieve genuine cross-iteration introspection. These results highlight both the potential and inherent limitations of recurrent structures in enhancing a model’s self-reflective capabilities.
📝 Abstract
Large Language Models (LLMs) often exhibit a gap between their internal knowledge and their explicit linguistic outputs. In this report, we empirically investigate whether Looped Transformers (LTs)--architectures that increase computational depth by iterating shared layers--can bridge this gap by utilizing their iterative nature as a form of introspection. Our experiments reveal that while increasing loop iterations narrows the gap, it is partly driven by a degradation of their internal knowledge carried by representations. Moreover, another empirical analysis suggests that current LTs'ability to perceive representations does not improve across loops; it is only present in the final loop. These results suggest that while LTs offer a promising direction for scaling computational depth, they have yet to achieve the introspection required to truly link representation space and natural language.