Rethinking Reflection in Pre-Training

📅 2025-04-05

📈 Citations: 0

✨ Influential: 0

career value

182K/year

🤖 AI Summary

This study investigates whether large language models (LLMs) spontaneously develop self-reflection and error-correction capabilities during pretraining. To this end, we propose a controllable error injection method grounded in chain-of-thought (CoT) reasoning and construct the first pretraining-stage benchmark for evaluating introspective abilities—covering six task categories: logical reasoning, mathematical problem solving, symbolic manipulation, and others. We track the evolution of these capabilities across the 4-trillion-token pretraining trajectory of OLMo2-7B. Results demonstrate that self-reflective capacity emerges early in pretraining without reinforcement learning fine-tuning; models consistently detect and correct injected reasoning errors, with self-correction accuracy improving steadily across all six tasks. This work provides the first empirical validation of intrinsic introspective capability in pretraining, offering both a novel perspective on LLM cognitive development and a reproducible, task-diverse evaluation framework for studying emergent self-monitoring behaviors.

Technology Category

Application Category

📝 Abstract

A language model's ability to reflect on its own reasoning provides a key advantage for solving complex problems. While most recent research has focused on how this ability develops during reinforcement learning, we show that it actually begins to emerge much earlier - during the model's pre-training. To study this, we introduce deliberate errors into chains-of-thought and test whether the model can still arrive at the correct answer by recognizing and correcting these mistakes. By tracking performance across different stages of pre-training, we observe that this self-correcting ability appears early and improves steadily over time. For instance, an OLMo2-7B model pre-trained on 4 trillion tokens displays self-correction on our six self-reflection tasks.

Problem

Research questions and friction points this paper is trying to address.

Study self-reflection emergence in pre-training models

Test model ability to correct deliberate reasoning errors

Track self-correction improvement across pre-training stages

Innovation

Methods, ideas, or system contributions that make the work stand out.

Introducing deliberate errors in reasoning chains

Tracking self-correction during pre-training stages

Observing early emergence of reflection ability

🔎 Similar Papers

No similar papers found.