🤖 AI Summary
This work addresses the issue of temporal information leakage in large language models, which often generate responses relying on knowledge beyond their training cutoff date. To mitigate this, the authors propose TCFT (Temporal Critique Fine-Tuning), a novel framework that explicitly models temporal compliance as a relationship between model outputs and the training cutoff time. TCFT introduces a temporal critique mechanism enabling the model to assess and explain whether its responses adhere to temporal boundaries. Through prompt-based interventions—including explicit cutoff declarations and prefix constraints—and supervised fine-tuning, the approach is evaluated on Qwen2.5-7B and Qwen2.5-14B. Experimental results demonstrate that TCFT significantly reduces temporal leakage by 41.89 and 37.79 percentage points on average compared to baseline methods, respectively.
📝 Abstract
Large language models (LLMs) often fail to reason under temporal cutoffs: when prompted to answer from the standpoint of an earlier time, they exploit knowledge that became available only later. We study this failure through the lens of ex-ante reasoning, where a model must rely exclusively on information knowable before a cutoff. Through a systematic analysis of prompt-level interventions, we find that temporal leakage is highly sensitive to cutoff formulation and instruction placement: explicit cutoff statements outperform implicit historical framings, and prefix constraints reduce leakage more effectively than suffix constraints. These findings indicate that prompting can steer models into a temporal frame, but does not endow them with the ability to verify whether a response is temporally admissible. We further argue that supervised fine-tuning is insufficient, since ex-ante correctness is not an intrinsic property of an answer, but a relation between the answer and the cutoff. To address this gap, we propose TCFT, a Temporal Critique Fine-Tuning framework that trains models to acquire cutoff-aware temporal verification. Given a query, a cutoff, and a candidate response, TCFT teaches the model to identify post-cutoff leakage, explain temporal boundary violations, and judge temporal admissibility. Experiments with Qwen2.5-7B-Instruct and Qwen2.5-14B-Instruct show that TCFT consistently outperforms prompting and SFT baselines, reducing average leakage by 41.89 and 37.79 percentage points, respectively.