Rewriting History: A Recipe for Interventional Analyses to Study Data Effects on Model Behavior

📅 2025-10-15

📈 Citations: 0

✨ Influential: 0

career value

191K/year

🤖 AI Summary

Conventional correlational analyses fail to establish causal links between training data and language model (LM) behavior. Method: We propose a “rewriting history” intervention framework that systematically identifies, modifies, and re-trains on training documents containing target knowledge—leveraging co-occurrence statistics and information retrieval for precise document matching—and quantifies behavioral changes via standardized benchmarks. Contribution/Results: This work introduces the first controlled, causal data intervention at the training stage, moving beyond observational studies to enable rigorous causal testing of data effects on LM behavior. Experiments demonstrate that localized data rewriting significantly alters model knowledge expression; however, current matching strategies remain insufficient to fully account for knowledge acquisition, revealing the inherent complexity of the mapping between training data and emergent model knowledge.

Technology Category

Application Category

📝 Abstract

We present an experimental recipe for studying the relationship between training data and language model (LM) behavior. We outline steps for intervening on data batches -- i.e., ``rewriting history'' -- and then retraining model checkpoints over that data to test hypotheses relating data to behavior. Our recipe breaks down such an intervention into stages that include selecting evaluation items from a benchmark that measures model behavior, matching relevant documents to those items, and modifying those documents before retraining and measuring the effects. We demonstrate the utility of our recipe through case studies on factual knowledge acquisition in LMs, using both cooccurrence statistics and information retrieval methods to identify documents that might contribute to knowledge learning. Our results supplement past observational analyses that link cooccurrence to model behavior, while demonstrating that extant methods for identifying relevant training documents do not fully explain an LM's ability to correctly answer knowledge questions. Overall, we outline a recipe that researchers can follow to test further hypotheses about how training data affects model behavior. Our code is made publicly available to promote future work.

Problem

Research questions and friction points this paper is trying to address.

Studying how training data affects language model behavior through interventions

Testing hypotheses about data's role in knowledge acquisition in models

Evaluating methods for identifying relevant training documents for model learning

Innovation

Methods, ideas, or system contributions that make the work stand out.

Intervening on data batches to rewrite model training history

Retraining model checkpoints to test data-behavior hypotheses

Modifying documents before retraining to measure effects

🔎 Similar Papers

Causal Post-Processing of Predictive Models