Instruct-of-Reflection: Enhancing Large Language Models Iterative Reflection Capabilities via Dynamic-Meta Instruction

📅 2025-03-02
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Static self-reflection in large language models (LLMs) often induces redundancy, reasoning drift, and stubbornness, while lacking intrinsic correction mechanisms without external feedback. Method: We propose IoRT, an Iterative Reflection framework driven by dynamic meta-instructions. IoRT introduces a novel, refreshable, terminable, and selective multi-strategy meta-instruction mechanism, integrating a meta-cognitive instruction generator and a self-consistency classifier to enable interpretable intervention and adaptive termination of reflection. It comprises three core components: instruction-driven reflection control, iterative response reweighting, and dynamic termination. Contribution/Results: Evaluated on mathematical and commonsense reasoning benchmarks, IoRT achieves an average 10.1% improvement over strong baselines—including Chain-of-Thought and Self-Refine—demonstrating the first systematic realization of controllable, interpretable, and adaptively optimized reflection behavior in LLMs.

Technology Category

Application Category

📝 Abstract
Self-reflection for Large Language Models (LLMs) has gained significant attention. Existing approaches involve models iterating and improving their previous responses based on LLMs' internal reflection ability or external feedback. However, recent research has raised doubts about whether intrinsic self-correction without external feedback may even degrade performance. Based on our empirical evidence, we find that current static reflection methods may lead to redundant, drift, and stubborn issues. To mitigate this, we introduce Instruct-of-Reflection (IoRT), a novel and general reflection framework that leverages dynamic-meta instruction to enhance the iterative reflection capability of LLMs. Specifically, we propose the instructor driven by the meta-thoughts and self-consistency classifier, generates various instructions, including refresh, stop, and select, to guide the next reflection iteration. Our experiments demonstrate that IoRT achieves an average improvement of 10.1% over established baselines in mathematical and commonsense reasoning tasks, highlighting its efficacy and applicability.
Problem

Research questions and friction points this paper is trying to address.

Enhance iterative reflection in Large Language Models.
Address issues of redundancy, drift, and stubbornness in static reflection methods.
Improve performance in mathematical and commonsense reasoning tasks.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Dynamic-meta instruction enhances LLM reflection.
Instructor uses meta-thoughts for reflection guidance.
Self-consistency classifier improves iterative reflection.
🔎 Similar Papers
No similar papers found.
Liping Liu
Liping Liu
Beijing University of Posts and Telecommunications, State Key Laboratory of Networking and Switching Technology, Beijing 100876, China
C
Chunhong Zhang
Beijing University of Posts and Telecommunications, State Key Laboratory of Networking and Switching Technology, Beijing 100876, China
L
Likang Wu
Tianjin University
Chuang Zhao
Chuang Zhao
PhD Candidate, The Hong Kong University of Science and Technology
AI for HealthcareRecommendation SystemTransfer Learning
Z
Zheng Hu
Beijing University of Posts and Telecommunications, State Key Laboratory of Networking and Switching Technology, Beijing 100876, China
M
Ming He
AI Lab of Lenovo Research
Jianping Fan
Jianping Fan
AI Lab at Lenovo Research
AIComputer VisionMachine LearningQuantum Computing