Strategy Coopetition Explains the Emergence and Transience of In-Context Learning

📅 2025-03-07

📈 Citations: 0

✨ Influential: 0

career value

218K/year

🤖 AI Summary

This paper investigates why in-context learning (ICL) capability spontaneously emerges yet subsequently decays during Transformer training. Method: The authors introduce the novel concept of “strategy co-competition,” revealing a dynamic, shared-subcircuit relationship—simultaneously competitive and cooperative—between ICL and context-informed weight-intrinsic learning (CIWL). They construct the first minimal mathematical model formalizing this mechanism and design a new training paradigm that sustains ICL persistently. Contribution/Results: Through mechanistic analysis, dynamic modeling, attribution-based circuit localization, and controlled task distillation experiments, the work provides the first systematic explanation of ICL’s emergence and decay. It achieves controllable transformation of ICL from transient to stable behavior and empirically validates CIWL as the asymptotically dominant learning strategy across diverse settings.

Technology Category

Application Category

📝 Abstract

In-context learning (ICL) is a powerful ability that emerges in transformer models, enabling them to learn from context without weight updates. Recent work has established emergent ICL as a transient phenomenon that can sometimes disappear after long training times. In this work, we sought a mechanistic understanding of these transient dynamics. Firstly, we find that, after the disappearance of ICL, the asymptotic strategy is a remarkable hybrid between in-weights and in-context learning, which we term"context-constrained in-weights learning"(CIWL). CIWL is in competition with ICL, and eventually replaces it as the dominant strategy of the model (thus leading to ICL transience). However, we also find that the two competing strategies actually share sub-circuits, which gives rise to cooperative dynamics as well. For example, in our setup, ICL is unable to emerge quickly on its own, and can only be enabled through the simultaneous slow development of asymptotic CIWL. CIWL thus both cooperates and competes with ICL, a phenomenon we term"strategy coopetition."We propose a minimal mathematical model that reproduces these key dynamics and interactions. Informed by this model, we were able to identify a setup where ICL is truly emergent and persistent.

Problem

Research questions and friction points this paper is trying to address.

Understanding transient dynamics of in-context learning (ICL) in transformers.

Exploring competition and cooperation between ICL and context-constrained in-weights learning (CIWL).

Developing a mathematical model to explain and predict ICL emergence and persistence.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Hybrid learning strategy: context-constrained in-weights learning

Strategy coopetition: cooperation and competition dynamics

Mathematical model for emergent and persistent ICL

🔎 Similar Papers

The dynamic interplay between in-context and in-weight learning in humans and neural networks