TiMoE: Time-Aware Mixture of Language Experts

๐Ÿ“… 2025-08-12
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
Large language models (LLMs) suffer from knowledge obsolescence and temporal leakageโ€”i.e., inadvertently leveraging information posterior to the query timeโ€”due to static training snapshots. To address this, we propose Time-aware Mixture of Experts (T-MoE), a temporally grounded architecture that pretrains multiple time-aligned experts on segmented corpora spanning 2013โ€“2024. T-MoE employs time-gated routing and log-probability fusion to strictly enforce causal validity: inference relies exclusively on knowledge available up to the query timestamp. We introduce and publicly release TSQA, the first benchmark explicitly designed to evaluate temporal hallucination. Experiments demonstrate that T-MoE matches or surpasses state-of-the-art single-epoch models across eight NLP tasks and TSQA, while reducing future-knowledge misuse by up to 15%, significantly improving temporal consistency and factual reliability.

Technology Category

Application Category

๐Ÿ“ Abstract
Large language models (LLMs) are typically trained on fixed snapshots of the web, which means that their knowledge becomes stale and their predictions risk temporal leakage: relying on information that lies in the future relative to a query. We tackle this problem by pre-training from scratch a set of GPT-style experts on disjoint two-year slices of a 2013-2024 corpus and combining them through TiMoE, a Time-aware Mixture of Language Experts. At inference time, TiMoE masks all experts whose training window ends after the query timestamp and merges the remaining log-probabilities in a shared space, guaranteeing strict causal validity while retaining the breadth of multi-period knowledge. We also release TSQA, a 10k-question benchmark whose alternatives are explicitly labelled as past, future or irrelevant, allowing fine-grained measurement of temporal hallucinations. Experiments on eight standard NLP tasks plus TSQA show that a co-adapted TiMoE variant matches or exceeds the best single-period expert and cuts future-knowledge errors by up to 15%. Our results demonstrate that modular, time-segmented pre-training paired with causal routing is a simple yet effective path toward LLMs that stay chronologically grounded without sacrificing general performance much. We open source our code at TiMoE (Github): https://github.com/epfml/TiMoE
Problem

Research questions and friction points this paper is trying to address.

Prevent temporal leakage in LLMs by time-aware expert masking
Ensure causal validity while retaining multi-period knowledge
Reduce future-knowledge errors with time-segmented pre-training
Innovation

Methods, ideas, or system contributions that make the work stand out.

Time-aware Mixture of Language Experts (TiMoE)
Pre-training on disjoint time slices
Causal routing with timestamp masking
๐Ÿ”Ž Similar Papers
No similar papers found.
R
Robin Faro
EPFL, Switzerland
Dongyang Fan
Dongyang Fan
EPFL
machine learningLLMs
T
Tamar Alphaidze
EPFL, Switzerland
Martin Jaggi
Martin Jaggi
EPFL
Machine LearningOptimization