Can Language Models Represent the Past without Anachronism?

📅 2025-04-28
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper investigates large language models’ (LLMs) ability to avoid anachronism—temporal inconsistencies in linguistic style and conceptual framing—when simulating historical contexts. Methodologically, the authors develop a dual-evaluation framework combining automated discriminators with expert human assessment to rigorously test multiple post-training strategies, including prompt engineering and supervised fine-tuning. Results reveal that while fine-tuned models can deceive automated detectors, human evaluators consistently identify statistically significant deviations from authentic historical texts in both diction and epistemic stance. The core contribution is the first empirical demonstration that post-training interventions alone are insufficient to eliminate anachronism; instead, the study argues that temporally grounded pretraining—i.e., incorporating period-specific corpora during pretraining—is a more fundamental and effective mitigation strategy. This finding challenges prevailing assumptions about the sufficiency of alignment techniques for historical fidelity and underscores the importance of temporal grounding in foundational model training.

Technology Category

Application Category

📝 Abstract
Before researchers can use language models to simulate the past, they need to understand the risk of anachronism. We find that prompting a contemporary model with examples of period prose does not produce output consistent with period style. Fine-tuning produces results that are stylistically convincing enough to fool an automated judge, but human evaluators can still distinguish fine-tuned model outputs from authentic historical text. We tentatively conclude that pretraining on period prose may be required in order to reliably simulate historical perspectives for social research.
Problem

Research questions and friction points this paper is trying to address.

Assessing anachronism risk in language models for historical simulation
Evaluating period style accuracy in prompted contemporary models
Determining pretraining needs for reliable historical perspective simulation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Prompting with period prose fails stylistically
Fine-tuning fools automated style judges
Pretraining on period prose may be essential
🔎 Similar Papers
No similar papers found.