ReflCtrl: Controlling LLM Reflection via Representation Engineering

📅 2025-12-15
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses inference redundancy arising from self-reflection in large language models (LLMs). We propose a fine-grained, latent-space representation–based method for dynamically controlling reflection frequency. Our core contribution is the first formulation of self-reflection as an extractable and manipulable direction in the model’s latent space—revealing its strong correlation with internal uncertainty signals—and leveraging this insight to enable uncertainty-driven adaptive reflection. The method integrates representation engineering, latent-direction analysis, stepwise latent-space guidance, and uncertainty estimation. Experiments across mathematical reasoning and code generation tasks demonstrate that our approach reduces inference token consumption by up to 33.6% without performance degradation, confirming the presence of substantial yet controllable reflective redundancy in strong LLMs. Moreover, the mechanism exhibits cross-task generalizability.

Technology Category

Application Category

📝 Abstract
Large language models (LLMs) with Chain-of-Thought (CoT) reasoning have achieved strong performance across diverse tasks, including mathematics, coding, and general reasoning. A distinctive ability of these reasoning models is self-reflection: the ability to review and revise previous reasoning steps. While self-reflection enhances reasoning performance, it also increases inference cost. In this work, we study self-reflection through the lens of representation engineering. We segment the model's reasoning into steps, identify the steps corresponding to reflection, and extract a reflection direction in the latent space that governs this behavior. Using this direction, we propose a stepwise steering method that can control reflection frequency. We call our framework ReflCtrl. Our experiments show that (1) in many cases reflections are redundant, especially in stronger models (in our experiments, we can save up to 33.6 percent of reasoning tokens while preserving performance), and (2) the model's reflection behavior is highly correlated with an internal uncertainty signal, implying self-reflection may be controlled by the model's uncertainty.
Problem

Research questions and friction points this paper is trying to address.

Control reflection frequency in LLMs via representation engineering
Reduce inference cost by eliminating redundant self-reflection steps
Link reflection behavior to internal uncertainty for efficient steering
Innovation

Methods, ideas, or system contributions that make the work stand out.

Extracts reflection direction in latent space
Proposes stepwise steering to control reflection frequency
Correlates reflection behavior with internal uncertainty signal
🔎 Similar Papers
No similar papers found.