ReflectEvo: Improving Meta Introspection of Small LLMs by Learning Self-Reflection

📅 2025-05-22

📈 Citations: 0

✨ Influential: 0

career value

173K/year

🤖 AI Summary

Small language models (SLMs) often lack robust metacognitive awareness and reasoning capabilities. Method: We propose ReflectEvo, the first self-driven reflective evolution framework for SLMs, wherein models iteratively generate high-quality, multi-domain reflective data (460K samples) without reliance on large-model distillation or human annotation. ReflectEvo integrates instruction expansion and multi-task coverage strategies to construct diverse training data, followed by supervised fine-tuning (SFT) and direct preference optimization (DPO). Contribution/Results: On standard reasoning benchmarks, Llama-3 and Mistral achieve absolute accuracy gains of 18.8% and 26.7%, reaching 71.2% and 71.1%, respectively. On BIG-bench, they match or surpass three leading open-weight large language models—Llama-3-70B, Mixtral-8x22B, and Qwen2-72B—demonstrating that self-reflection critically enables error localization, correction, and sustained self-improvement in SLMs.

Technology Category

Application Category

📝 Abstract

We present a novel pipeline, ReflectEvo, to demonstrate that small language models (SLMs) can enhance meta introspection through reflection learning. This process iteratively generates self-reflection for self-training, fostering a continuous and self-evolving process. Leveraging this pipeline, we construct ReflectEvo-460k, a large-scale, comprehensive, self-generated reflection dataset with broadened instructions and diverse multi-domain tasks. Building upon this dataset, we demonstrate the effectiveness of reflection learning to improve SLMs' reasoning abilities using SFT and DPO with remarkable performance, substantially boosting Llama-3 from 52.4% to 71.2% and Mistral from 44.4% to 71.1%. It validates that ReflectEvo can rival or even surpass the reasoning capability of the three prominent open-sourced models on BIG-bench without distillation from superior models or fine-grained human annotation. We further conduct a deeper analysis of the high quality of self-generated reflections and their impact on error localization and correction. Our work highlights the potential of continuously enhancing the reasoning performance of SLMs through iterative reflection learning in the long run.

Problem

Research questions and friction points this paper is trying to address.

Enhancing meta introspection in small language models via reflection learning

Creating a self-generated reflection dataset for diverse multi-domain tasks

Improving reasoning abilities of SLMs without human annotation or distillation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Iterative self-reflection learning for SLMs

Large-scale self-generated reflection dataset

Boosts reasoning via SFT and DPO

🔎 Similar Papers

Self-Reflection Makes Large Language Models Safer, Less Biased, and Ideologically Neutral

2024-06-14Citations: 0

Nvidia

30 USD - 94 USD

US, CA, Santa Clara

Research Scientist, AI Language