Self-Improving Pretraining: using post-trained models to pretrain better models

📅 2026-01-29
📈 Citations: 1
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge that large language models often acquire unsafe, factually inaccurate, or low-quality generation patterns during pretraining, which are difficult to fully rectify in subsequent alignment stages. To mitigate this, the authors propose a self-improving pretraining approach that, during streaming document processing, leverages reinforcement learning to optimize each generation step toward the next K tokens. A strong post-trained model serves as a critic, providing multidimensional evaluations—assessing factual consistency, safety, and quality—across three variants: the model’s own continuation, the original document suffix, and a rewritten suffix. Based on these dynamic assessments, the training signal is adaptively adjusted. This method uniquely integrates post-training model feedback directly into the pretraining phase, enabling behavior optimization at the source. Experiments demonstrate substantial improvements over standard pretraining, with gains of 36.2% in factuality, 18.5% in safety, and an overall generation quality win rate of 86.3%.

Technology Category

Application Category

📝 Abstract
Ensuring safety, factuality and overall quality in the generations of large language models is a critical challenge, especially as these models are increasingly deployed in real-world applications. The prevailing approach to addressing these issues involves collecting expensive, carefully curated datasets and applying multiple stages of fine-tuning and alignment. However, even this complex pipeline cannot guarantee the correction of patterns learned during pretraining. Therefore, addressing these issues during pretraining is crucial, as it shapes a model's core behaviors and prevents unsafe or hallucinated outputs from becoming deeply embedded. To tackle this issue, we introduce a new pretraining method that streams documents and uses reinforcement learning (RL) to improve the next K generated tokens at each step. A strong, post-trained model judges candidate generations -- including model rollouts, the original suffix, and a rewritten suffix -- for quality, safety, and factuality. Early in training, the process relies on the original and rewritten suffixes; as the model improves, RL rewards high-quality rollouts. This approach builds higher quality, safer, and more factual models from the ground up. In experiments, our method gives 36.2% and 18.5% relative improvements over standard pretraining in terms of factuality and safety, and up to 86.3% win rate improvements in overall generation quality.
Problem

Research questions and friction points this paper is trying to address.

pretraining
safety
factuality
large language models
generation quality
Innovation

Methods, ideas, or system contributions that make the work stand out.

Self-Improving Pretraining
Reinforcement Learning
Factuality
Safety
Post-trained Model
🔎 Similar Papers
No similar papers found.
E
Ellen Xiaoqing Tan
FAIR at Meta
S
S. Dhuliawala
FAIR at Meta
Jing Xu
Jing Xu
Meta AI Research (FAIR)
NLPmachine learninggame theory
P
Ping Yu
FAIR at Meta
Sainbayar Sukhbaatar
Sainbayar Sukhbaatar
FAIR team, Meta AI
deep learningmachine learning
J
Jason E. Weston
FAIR at Meta
O
Olga Golovneva
FAIR at Meta