Can Language Models Represent the Past without Anachronism?

📅 2025-04-28

📈 Citations: 0

✨ Influential: 0

career value

180K/year

🤖 AI Summary

This paper investigates large language models’ (LLMs) ability to avoid anachronism—temporal inconsistencies in linguistic style and conceptual framing—when simulating historical contexts. Methodologically, the authors develop a dual-evaluation framework combining automated discriminators with expert human assessment to rigorously test multiple post-training strategies, including prompt engineering and supervised fine-tuning. Results reveal that while fine-tuned models can deceive automated detectors, human evaluators consistently identify statistically significant deviations from authentic historical texts in both diction and epistemic stance. The core contribution is the first empirical demonstration that post-training interventions alone are insufficient to eliminate anachronism; instead, the study argues that temporally grounded pretraining—i.e., incorporating period-specific corpora during pretraining—is a more fundamental and effective mitigation strategy. This finding challenges prevailing assumptions about the sufficiency of alignment techniques for historical fidelity and underscores the importance of temporal grounding in foundational model training.

Technology Category

Application Category

📝 Abstract

Before researchers can use language models to simulate the past, they need to understand the risk of anachronism. We find that prompting a contemporary model with examples of period prose does not produce output consistent with period style. Fine-tuning produces results that are stylistically convincing enough to fool an automated judge, but human evaluators can still distinguish fine-tuned model outputs from authentic historical text. We tentatively conclude that pretraining on period prose may be required in order to reliably simulate historical perspectives for social research.

Problem

Research questions and friction points this paper is trying to address.

Assessing anachronism risk in language models for historical simulation

Evaluating period style accuracy in prompted contemporary models

Determining pretraining needs for reliable historical perspective simulation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Prompting with period prose fails stylistically

Fine-tuning fools automated style judges

Pretraining on period prose may be essential

🔎 Similar Papers

Time Awareness in Large Language Models: Benchmarking Fact Recall Across Time