Temporal Task Diversity: Inductive Biases Under Non-Stationarity in Synthetic Sequence Modelling

📅 2026-05-18
📈 Citations: 0
Influential: 0
📄 PDF

career value

203K/year
🤖 AI Summary
This study investigates how non-stationarity in data distributions during training influences the inductive biases of deep learning models, with a focus on the trade-off between generalization and memorization. By dynamically varying task distributions over time, the authors analyze the evolution of inductive biases in Transformers within a contextual linear regression sequence modeling framework. Using a small-scale Transformer architecture trained on synthetic data with temporally varying task diversity, they evaluate performance through in-context learning combined with linear regression. Their results provide the first systematic evidence that temporal task diversity significantly enhances the model’s preference for generalization over memorization, thereby improving structural robustness and safe generalization capabilities.
📝 Abstract
Modern deep learning science often assumes that neural networks learn from a fixed data distribution. However, many practically important learning problems involve data distributions that change throughout training. How does such non-stationarity impact the inductive biases of deep learning towards models with different structural, generalisation, and safety properties? A fruitful testbed for studying inductive bias is in-context linear regression sequence modelling, where small transformers display strikingly different generalisation patterns depending on the diversity of the (fixed) training task distribution. In this paper, we explore the effect of diversifying the task distribution across training time, finding that such temporal diversity leads to an increased bias towards generalisation over memorisation.
Problem

Research questions and friction points this paper is trying to address.

non-stationarity
inductive biases
temporal task diversity
generalization
sequence modelling
Innovation

Methods, ideas, or system contributions that make the work stand out.

Temporal Task Diversity
Non-Stationarity
Inductive Bias
In-Context Learning
Generalization vs Memorization