Date Fragments: A Hidden Bottleneck of Tokenization for Temporal Reasoning

📅 2025-05-22
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Modern BPE tokenizers erroneously segment calendar dates (e.g., “20250312”) into semantically meaningless subtokens (e.g., “202”, “503”, “12”), exacerbating token redundancy and disrupting temporal structure—severely degrading model performance on time-reasoning tasks. Method: We first propose the “date fragmentation rate” to quantify this defect; construct DateAugBench—a comprehensive benchmark spanning historical, contemporary, and future date scenarios; and conduct layer-wise probing and causal attention analysis to investigate how LLMs internally process fragmented dates. Contribution/Results: We discover that LLMs spontaneously develop a cross-layer “date abstraction” mechanism—reconstructing fragments along a hierarchical year→month→day pathway, yielding structured temporal reasoning distinct from human cognition. Empirically, fragmentation reduces accuracy on rare dates by up to 10 percentage points; larger models exhibit this mechanism earlier in depth; and it generalizes across date parsing, format-invariant interpretation, and date arithmetic tasks.

Technology Category

Application Category

📝 Abstract
Modern BPE tokenizers often split calendar dates into meaningless fragments, e.g., 20250312 $ ightarrow$ 202, 503, 12, inflating token counts and obscuring the inherent structure needed for robust temporal reasoning. In this work, we (1) introduce a simple yet interpretable metric, termed date fragmentation ratio, that measures how faithfully a tokenizer preserves multi-digit date components; (2) release DateAugBench, a suite of 6500 examples spanning three temporal reasoning tasks: context-based date resolution, format-invariance puzzles, and date arithmetic across historical, contemporary, and future regimes; and (3) through layer-wise probing and causal attention-hop analyses, uncover an emergent date-abstraction mechanism whereby large language models stitch together the fragments of month, day, and year components for temporal reasoning. Our experiments show that excessive fragmentation correlates with accuracy drops of up to 10 points on uncommon dates like historical and futuristic dates. Further, we find that the larger the model, the faster the emergent date abstraction that heals date fragments is accomplished. Lastly, we observe a reasoning path that LLMs follow to assemble date fragments, typically differing from human interpretation (year $ ightarrow$ month $ ightarrow$ day).
Problem

Research questions and friction points this paper is trying to address.

BPE tokenizers split dates into meaningless fragments
Date fragmentation hinders robust temporal reasoning in LLMs
Excessive date fragmentation reduces accuracy on uncommon dates
Innovation

Methods, ideas, or system contributions that make the work stand out.

Introduces date fragmentation ratio metric
Releases DateAugBench for temporal reasoning
Uncovers emergent date-abstraction mechanism in LLMs
🔎 Similar Papers
No similar papers found.