🤖 AI Summary
Static self-reflection in large language models (LLMs) often induces redundancy, reasoning drift, and stubbornness, while lacking intrinsic correction mechanisms without external feedback. Method: We propose IoRT, an Iterative Reflection framework driven by dynamic meta-instructions. IoRT introduces a novel, refreshable, terminable, and selective multi-strategy meta-instruction mechanism, integrating a meta-cognitive instruction generator and a self-consistency classifier to enable interpretable intervention and adaptive termination of reflection. It comprises three core components: instruction-driven reflection control, iterative response reweighting, and dynamic termination. Contribution/Results: Evaluated on mathematical and commonsense reasoning benchmarks, IoRT achieves an average 10.1% improvement over strong baselines—including Chain-of-Thought and Self-Refine—demonstrating the first systematic realization of controllable, interpretable, and adaptively optimized reflection behavior in LLMs.
📝 Abstract
Self-reflection for Large Language Models (LLMs) has gained significant attention. Existing approaches involve models iterating and improving their previous responses based on LLMs' internal reflection ability or external feedback. However, recent research has raised doubts about whether intrinsic self-correction without external feedback may even degrade performance. Based on our empirical evidence, we find that current static reflection methods may lead to redundant, drift, and stubborn issues. To mitigate this, we introduce Instruct-of-Reflection (IoRT), a novel and general reflection framework that leverages dynamic-meta instruction to enhance the iterative reflection capability of LLMs. Specifically, we propose the instructor driven by the meta-thoughts and self-consistency classifier, generates various instructions, including refresh, stop, and select, to guide the next reflection iteration. Our experiments demonstrate that IoRT achieves an average improvement of 10.1% over established baselines in mathematical and commonsense reasoning tasks, highlighting its efficacy and applicability.