ReflectEvo: Improving Meta Introspection of Small LLMs by Learning Self-Reflection

📅 2025-05-22
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Small language models (SLMs) often lack robust metacognitive awareness and reasoning capabilities. Method: We propose ReflectEvo, the first self-driven reflective evolution framework for SLMs, wherein models iteratively generate high-quality, multi-domain reflective data (460K samples) without reliance on large-model distillation or human annotation. ReflectEvo integrates instruction expansion and multi-task coverage strategies to construct diverse training data, followed by supervised fine-tuning (SFT) and direct preference optimization (DPO). Contribution/Results: On standard reasoning benchmarks, Llama-3 and Mistral achieve absolute accuracy gains of 18.8% and 26.7%, reaching 71.2% and 71.1%, respectively. On BIG-bench, they match or surpass three leading open-weight large language models—Llama-3-70B, Mixtral-8x22B, and Qwen2-72B—demonstrating that self-reflection critically enables error localization, correction, and sustained self-improvement in SLMs.

Technology Category

Application Category

📝 Abstract
We present a novel pipeline, ReflectEvo, to demonstrate that small language models (SLMs) can enhance meta introspection through reflection learning. This process iteratively generates self-reflection for self-training, fostering a continuous and self-evolving process. Leveraging this pipeline, we construct ReflectEvo-460k, a large-scale, comprehensive, self-generated reflection dataset with broadened instructions and diverse multi-domain tasks. Building upon this dataset, we demonstrate the effectiveness of reflection learning to improve SLMs' reasoning abilities using SFT and DPO with remarkable performance, substantially boosting Llama-3 from 52.4% to 71.2% and Mistral from 44.4% to 71.1%. It validates that ReflectEvo can rival or even surpass the reasoning capability of the three prominent open-sourced models on BIG-bench without distillation from superior models or fine-grained human annotation. We further conduct a deeper analysis of the high quality of self-generated reflections and their impact on error localization and correction. Our work highlights the potential of continuously enhancing the reasoning performance of SLMs through iterative reflection learning in the long run.
Problem

Research questions and friction points this paper is trying to address.

Enhancing meta introspection in small language models via reflection learning
Creating a self-generated reflection dataset for diverse multi-domain tasks
Improving reasoning abilities of SLMs without human annotation or distillation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Iterative self-reflection learning for SLMs
Large-scale self-generated reflection dataset
Boosts reasoning via SFT and DPO
J
Jiaqi Li
State Key Laboratory of General Artificial Intellligence, BIGAI
Xinyi Dong
Xinyi Dong
State key laboratory of cognitive neuroscience and learning, Beijing Normal University
Y
Yang Liu
State Key Laboratory of General Artificial Intellligence, BIGAI
Zhizhuo Yang
Zhizhuo Yang
Rochester Institute of Technology
Artificial IntelligenceReinforcement LearningActive InferenceEye TrackingAR/VR
Q
Quansen Wang
State Key Laboratory of General Artificial Intellligence, BIGAI, Peking University
Xiaobo Wang
Xiaobo Wang
University of Science and Technology of China
Natural Language Processing
S
SongChun Zhu
State Key Laboratory of General Artificial Intellligence, BIGAI, Peking University
Zixia Jia
Zixia Jia
BigAI
NLP
Z
Zilong Zheng
State Key Laboratory of General Artificial Intellligence, BIGAI