Predictability Shapes Adaptation: An Evolutionary Perspective on Modes of Learning in Transformers

📅 2025-05-14

📈 Citations: 0

✨ Influential: 0

career value

208K/year

🤖 AI Summary

This work investigates the dynamic trade-off between in-weight learning (IWL) and in-context learning (ICL) in Transformers, focusing on how environmental predictability—quantified via stability and cue reliability—modulates this balance. Drawing an analogy to the “genetic encoding–phenotypic plasticity” paradigm in evolutionary biology, we formalize stability and cue reliability as continuous regulatory dimensions. We design controlled regression and classification experiments, integrating dynamic trajectory analysis and phase-switch detection to characterize IWL/ICL phase transitions and task-dependent temporal evolution. Key findings: high stability induces a sharp IWL phase transition; high cue reliability—especially under low stability—enhances ICL; and we identify a non-canonical “IWL-first → ICL-later” learning sequence. Based on these results, we propose the relative cost hypothesis, offering a novel conceptual framework for adaptive model training strategies.

Technology Category

Application Category

📝 Abstract

Transformer models learn in two distinct modes: in-weights learning (IWL), encoding knowledge into model weights, and in-context learning (ICL), adapting flexibly to context without weight modification. To better understand the interplay between these learning modes, we draw inspiration from evolutionary biology's analogous adaptive strategies: genetic encoding (akin to IWL, adapting over generations and fixed within an individual's lifetime) and phenotypic plasticity (akin to ICL, enabling flexible behavioral responses to environmental cues). In evolutionary biology, environmental predictability dictates the balance between these strategies: stability favors genetic encoding, while reliable predictive cues promote phenotypic plasticity. We experimentally operationalize these dimensions of predictability and systematically investigate their influence on the ICL/IWL balance in Transformers. Using regression and classification tasks, we show that high environmental stability decisively favors IWL, as predicted, with a sharp transition at maximal stability. Conversely, high cue reliability enhances ICL efficacy, particularly when stability is low. Furthermore, learning dynamics reveal task-contingent temporal evolution: while a canonical ICL-to-IWL shift occurs in some settings (e.g., classification with many classes), we demonstrate that scenarios with easier IWL (e.g., fewer classes) or slower ICL acquisition (e.g., regression) can exhibit an initial IWL phase later yielding to ICL dominance. These findings support a relative-cost hypothesis for explaining these learning mode transitions, establishing predictability as a critical factor governing adaptive strategies in Transformers, and offering novel insights for understanding ICL and guiding training methodologies.

Problem

Research questions and friction points this paper is trying to address.

Understand interplay between in-weights and in-context learning in Transformers

Investigate how environmental predictability influences learning mode balance

Explore task-contingent temporal evolution of learning dynamics

Innovation

Methods, ideas, or system contributions that make the work stand out.

Transformers use in-weights and in-context learning modes

Environmental predictability influences learning mode balance

Task-contingent temporal evolution shifts learning dominance

🔎 Similar Papers

A Bayesian Model Selection Criterion for Selecting Pretraining Checkpoints