Examining and Adapting Time for Multilingual Classification via Mixture of Temporal Experts

📅 2025-02-12

📈 Citations: 0

✨ Influential: 0

career value

151K/year

🤖 AI Summary

State-of-the-art multilingual text classification models overlook temporal distributional shift—caused by data evolution over time—leading to substantial degradation in cross-year generalization. This work pioneers the conceptualization of time as an implicit domain (e.g., 2024 vs. 2025) and introduces a temporally aware cross-lingual generalization framework. Its core innovation is the Mixture of Temporal Experts (MoTE) architecture, which jointly models semantic evolution and temporal distributional shift through integrated domain adaptation, Mixture-of-Experts (MoE) routing, temporal distribution alignment, and fine-tuning of multilingual pretrained language models. Evaluated on a multilingual, multi-year benchmark, the framework demonstrates strong temporal sensitivity and achieves an average accuracy improvement of 3.2% over strong baselines. It significantly enhances model robustness and adaptability to future data distributions while preserving cross-lingual transfer capability.

Technology Category

Application Category

📝 Abstract

Time is implicitly embedded in classification process: classifiers are usually built on existing data while to be applied on future data whose distributions (e.g., label and token) may change. However, existing state-of-the-art classification models merely consider the temporal variations and primarily focus on English corpora, which leaves temporal studies less explored, let alone under multilingual settings. In this study, we fill the gap by treating time as domains (e.g., 2024 vs. 2025), examining temporal effects, and developing a domain adaptation framework to generalize classifiers over time on multiple languages. Our framework proposes Mixture of Temporal Experts (MoTE) to leverage both semantic and data distributional shifts to learn and adapt temporal trends into classification models. Our analysis shows classification performance varies over time across different languages, and we experimentally demonstrate that MoTE can enhance classifier generalizability over temporal data shifts. Our study provides analytic insights and addresses the need for time-aware models that perform robustly in multilingual scenarios.

Problem

Research questions and friction points this paper is trying to address.

Addressing temporal variations in multilingual classification

Developing models adaptable to future data distributions

Enhancing classifier generalizability across multiple languages

Innovation

Methods, ideas, or system contributions that make the work stand out.

Mixture of Temporal Experts

Multilingual temporal adaptation

Semantic and distributional shifts

🔎 Similar Papers

DualTime: A Dual-Adapter Multimodal Language Model for Time Series Representation