Instruct-of-Reflection: Enhancing Large Language Models Iterative Reflection Capabilities via Dynamic-Meta Instruction

📅 2025-03-02

📈 Citations: 0

✨ Influential: 0

career value

169K/year

🤖 AI Summary

Static self-reflection in large language models (LLMs) often induces redundancy, reasoning drift, and stubbornness, while lacking intrinsic correction mechanisms without external feedback. Method: We propose IoRT, an Iterative Reflection framework driven by dynamic meta-instructions. IoRT introduces a novel, refreshable, terminable, and selective multi-strategy meta-instruction mechanism, integrating a meta-cognitive instruction generator and a self-consistency classifier to enable interpretable intervention and adaptive termination of reflection. It comprises three core components: instruction-driven reflection control, iterative response reweighting, and dynamic termination. Contribution/Results: Evaluated on mathematical and commonsense reasoning benchmarks, IoRT achieves an average 10.1% improvement over strong baselines—including Chain-of-Thought and Self-Refine—demonstrating the first systematic realization of controllable, interpretable, and adaptively optimized reflection behavior in LLMs.

Technology Category

Application Category

📝 Abstract

Self-reflection for Large Language Models (LLMs) has gained significant attention. Existing approaches involve models iterating and improving their previous responses based on LLMs' internal reflection ability or external feedback. However, recent research has raised doubts about whether intrinsic self-correction without external feedback may even degrade performance. Based on our empirical evidence, we find that current static reflection methods may lead to redundant, drift, and stubborn issues. To mitigate this, we introduce Instruct-of-Reflection (IoRT), a novel and general reflection framework that leverages dynamic-meta instruction to enhance the iterative reflection capability of LLMs. Specifically, we propose the instructor driven by the meta-thoughts and self-consistency classifier, generates various instructions, including refresh, stop, and select, to guide the next reflection iteration. Our experiments demonstrate that IoRT achieves an average improvement of 10.1% over established baselines in mathematical and commonsense reasoning tasks, highlighting its efficacy and applicability.

Problem

Research questions and friction points this paper is trying to address.

Enhance iterative reflection in Large Language Models.

Address issues of redundancy, drift, and stubbornness in static reflection methods.

Improve performance in mathematical and commonsense reasoning tasks.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Dynamic-meta instruction enhances LLM reflection.

Instructor uses meta-thoughts for reflection guidance.

Self-consistency classifier improves iterative reflection.

🔎 Similar Papers

No similar papers found.

Nvidia

30 USD - 94 USD

US, CA, Santa Clara

Research Scientist, AI Language