🤖 AI Summary
This work addresses inference redundancy arising from self-reflection in large language models (LLMs). We propose a fine-grained, latent-space representation–based method for dynamically controlling reflection frequency. Our core contribution is the first formulation of self-reflection as an extractable and manipulable direction in the model’s latent space—revealing its strong correlation with internal uncertainty signals—and leveraging this insight to enable uncertainty-driven adaptive reflection. The method integrates representation engineering, latent-direction analysis, stepwise latent-space guidance, and uncertainty estimation. Experiments across mathematical reasoning and code generation tasks demonstrate that our approach reduces inference token consumption by up to 33.6% without performance degradation, confirming the presence of substantial yet controllable reflective redundancy in strong LLMs. Moreover, the mechanism exhibits cross-task generalizability.
📝 Abstract
Large language models (LLMs) with Chain-of-Thought (CoT) reasoning have achieved strong performance across diverse tasks, including mathematics, coding, and general reasoning. A distinctive ability of these reasoning models is self-reflection: the ability to review and revise previous reasoning steps. While self-reflection enhances reasoning performance, it also increases inference cost. In this work, we study self-reflection through the lens of representation engineering. We segment the model's reasoning into steps, identify the steps corresponding to reflection, and extract a reflection direction in the latent space that governs this behavior. Using this direction, we propose a stepwise steering method that can control reflection frequency. We call our framework ReflCtrl. Our experiments show that (1) in many cases reflections are redundant, especially in stronger models (in our experiments, we can save up to 33.6 percent of reasoning tokens while preserving performance), and (2) the model's reflection behavior is highly correlated with an internal uncertainty signal, implying self-reflection may be controlled by the model's uncertainty.