Rethinking Reflection in Pre-Training

📅 2025-04-05
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study investigates whether large language models (LLMs) spontaneously develop self-reflection and error-correction capabilities during pretraining. To this end, we propose a controllable error injection method grounded in chain-of-thought (CoT) reasoning and construct the first pretraining-stage benchmark for evaluating introspective abilities—covering six task categories: logical reasoning, mathematical problem solving, symbolic manipulation, and others. We track the evolution of these capabilities across the 4-trillion-token pretraining trajectory of OLMo2-7B. Results demonstrate that self-reflective capacity emerges early in pretraining without reinforcement learning fine-tuning; models consistently detect and correct injected reasoning errors, with self-correction accuracy improving steadily across all six tasks. This work provides the first empirical validation of intrinsic introspective capability in pretraining, offering both a novel perspective on LLM cognitive development and a reproducible, task-diverse evaluation framework for studying emergent self-monitoring behaviors.

Technology Category

Application Category

📝 Abstract
A language model's ability to reflect on its own reasoning provides a key advantage for solving complex problems. While most recent research has focused on how this ability develops during reinforcement learning, we show that it actually begins to emerge much earlier - during the model's pre-training. To study this, we introduce deliberate errors into chains-of-thought and test whether the model can still arrive at the correct answer by recognizing and correcting these mistakes. By tracking performance across different stages of pre-training, we observe that this self-correcting ability appears early and improves steadily over time. For instance, an OLMo2-7B model pre-trained on 4 trillion tokens displays self-correction on our six self-reflection tasks.
Problem

Research questions and friction points this paper is trying to address.

Study self-reflection emergence in pre-training models
Test model ability to correct deliberate reasoning errors
Track self-correction improvement across pre-training stages
Innovation

Methods, ideas, or system contributions that make the work stand out.

Introducing deliberate errors in reasoning chains
Tracking self-correction during pre-training stages
Observing early emergence of reflection ability
🔎 Similar Papers
No similar papers found.
E
Essential AI Darsh J Shah
Essential AI, San Francisco, CA
P
Peter Rushton
Somanshu Singla
Somanshu Singla
Research @ Essential AI
Multimodal LearningLLM ReasoningLLM Alignment
M
Mohit Parmar
K
Kurt Smith
Y
Yash Vanjani
Ashish Vaswani
Ashish Vaswani
A
Adarsh Chaluvaraju
A
Andrew Hojel
Andrew Ma
Andrew Ma
Massachusetts Institute of Technology
materials sciencecondensed mattercomputational physicsmachine learning
Anil Thomas
Anil Thomas
Luminide, Inc.
A
Anthony Polloreno
A
Ashish Tanwer
B
Burhan Drak Sibai
D
Divya S Mansingka
D
Divya Shivaprasad
Ishaan Shah
Ishaan Shah
Research Engineer, GraphDeco, INRIA Sophia-Antipolis
Computer GraphicsRenderingLight Transport Simulation
Karl Stratos
Karl Stratos
Apple AI/ML
Natural Language ProcessingDeep Learning
K
Khoi Nguyen
M
Michael Callahan
Michael Pust
Michael Pust
Essential AI
LLMAINLUMLMT
M
Mrinal Iyer
P
Philip Monk
P
Platon Mazarakis
Ritvik Kapila
Ritvik Kapila
ML Research Scientist, Essential AI
LLM Pre-trainingDeep LearningPrivacy Preserving ML
S
Saurabh Srivastava
T
Tim Romanski