🤖 AI Summary
State-of-the-art multilingual text classification models overlook temporal distributional shift—caused by data evolution over time—leading to substantial degradation in cross-year generalization. This work pioneers the conceptualization of time as an implicit domain (e.g., 2024 vs. 2025) and introduces a temporally aware cross-lingual generalization framework. Its core innovation is the Mixture of Temporal Experts (MoTE) architecture, which jointly models semantic evolution and temporal distributional shift through integrated domain adaptation, Mixture-of-Experts (MoE) routing, temporal distribution alignment, and fine-tuning of multilingual pretrained language models. Evaluated on a multilingual, multi-year benchmark, the framework demonstrates strong temporal sensitivity and achieves an average accuracy improvement of 3.2% over strong baselines. It significantly enhances model robustness and adaptability to future data distributions while preserving cross-lingual transfer capability.
📝 Abstract
Time is implicitly embedded in classification process: classifiers are usually built on existing data while to be applied on future data whose distributions (e.g., label and token) may change. However, existing state-of-the-art classification models merely consider the temporal variations and primarily focus on English corpora, which leaves temporal studies less explored, let alone under multilingual settings. In this study, we fill the gap by treating time as domains (e.g., 2024 vs. 2025), examining temporal effects, and developing a domain adaptation framework to generalize classifiers over time on multiple languages. Our framework proposes Mixture of Temporal Experts (MoTE) to leverage both semantic and data distributional shifts to learn and adapt temporal trends into classification models. Our analysis shows classification performance varies over time across different languages, and we experimentally demonstrate that MoTE can enhance classifier generalizability over temporal data shifts. Our study provides analytic insights and addresses the need for time-aware models that perform robustly in multilingual scenarios.