Competition Dynamics Shape Algorithmic Phases of In-Context Learning

📅 2024-12-01
📈 Citations: 1
Influential: 0
📄 PDF
🤖 AI Summary
Existing in-context learning (ICL) studies lack a unified task framework, hindering systematic investigation of its fundamental mechanisms and universal principles. To address this, we propose a synthetic mixed Markov chain sequence modeling task, formally casting ICL as a dynamic competition among four primitive algorithms—defined by the Cartesian product of retrieval/inference strategies and unigram/bigram statistics. Through behavioral decomposition, competitive dynamical modeling, and interpretability analysis of large language models, we quantitatively characterize sharp phase transitions in algorithmic dominance as functions of context length and training scale, successfully reproducing canonical ICL phenomena. Our key contributions are: (1) establishing ICL as an emergent phase-transition process driven by multi-algorithm competition—not a monolithic mechanism; and (2) introducing the first unified explanatory framework that accounts for ICL’s transient nature and strong contextual dependence.

Technology Category

Application Category

📝 Abstract
In-Context Learning (ICL) has significantly expanded the general-purpose nature of large language models, allowing them to adapt to novel tasks using merely the inputted context. This has motivated a series of papers that analyze tractable synthetic domains and postulate precise mechanisms that may underlie ICL. However, the use of relatively distinct setups that often lack a sequence modeling nature to them makes it unclear how general the reported insights from such studies are. Motivated by this, we propose a synthetic sequence modeling task that involves learning to simulate a finite mixture of Markov chains. As we show, models trained on this task reproduce most well-known results on ICL, hence offering a unified setting for studying the concept. Building on this setup, we demonstrate we can explain a model's behavior by decomposing it into four broad algorithms that combine a fuzzy retrieval vs. inference approach with either unigram or bigram statistics of the context. These algorithms engage in a competition dynamics to dominate model behavior, with the precise experimental conditions dictating which algorithm ends up superseding others: e.g., we find merely varying context size or amount of training yields (at times sharp) transitions between which algorithm dictates the model behavior, revealing a mechanism that explains the transient nature of ICL. In this sense, we argue ICL is best thought of as a mixture of different algorithms, each with its own peculiarities, instead of a monolithic capability. This also implies that making general claims about ICL that hold universally across all settings may be infeasible.
Problem

Research questions and friction points this paper is trying to address.

Contextual Learning
Competitive Scenarios
Universality
Innovation

Methods, ideas, or system contributions that make the work stand out.

Sequential Learning
Multi-strategy Analysis
Contextual Learning Variability
🔎 Similar Papers
No similar papers found.
Core Francisco Park
Core Francisco Park
Harvard University
AI for ScienceScience of Deep Learning
Ekdeep Singh Lubana
Ekdeep Singh Lubana
Goodfire AI
AIMachine LearningDeep Learning
I
Itamar Pres
EECS Department, University of Michigan, Ann Arbor
H
Hidenori Tanaka
CBS-NTT Program in Physics of Intelligence, Harvard University; Physics & Informatics Laboratories, NTT Research, Inc.