What Matters for In-Context Learning: A Balancing Act of Look-up and In-Weight Learning

📅 2025-01-09
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work investigates the fundamental mechanism underlying in-context learning (ICL) in large language models, specifically addressing how models leverage contextual examples to generalize. We construct controllable sequence datasets and train deep autoregressive models, complemented by systematic ablation studies and stability analyses. Our key finding is that **conceptual repetition**—such as n-gram recurrence or image patch copying—is a necessary precondition for ICL emergence, exerting a stronger influence than conventional distributional properties like burstiness or heavy-tailedness. Furthermore, we establish that ICL emergence hinges on a dynamic equilibrium between *in-weight learning* (i.e., parameter updates during pretraining) and *contextual solving capability* (i.e., zero-shot inference via context). This insight substantially improves ICL stability and cross-modal generalization, with empirical validation across both text and image sequence tasks confirming that repetitive structural patterns fundamentally underpin ICL learnability.

Technology Category

Application Category

📝 Abstract
Large Language Models (LLMs) have demonstrated impressive performance in various tasks, including In-Context Learning (ICL), where the model performs new tasks by conditioning solely on the examples provided in the context, without updating the model's weights. While prior research has explored the roles of pretraining data and model architecture, the key mechanism behind ICL remains unclear. In this work, we systematically uncover properties present in LLMs that support the emergence of ICL. To disambiguate these factors, we conduct a study with a controlled dataset and data sequences using a deep autoregressive model. We show that conceptual repetitions in the data sequences are crucial for ICL, more so than previously indicated training data properties like burstiness or long-tail distribution. Conceptual repetitions could refer to $n$-gram repetitions in textual data or exact image copies in image sequence data. Such repetitions also offer other previously overlooked benefits such as reduced transiency in ICL performance. Furthermore, we show that the emergence of ICL depends on balancing the in-weight learning objective with the in-context solving ability during training.
Problem

Research questions and friction points this paper is trying to address.

Large Language Models
In-Context Learning
Information Processing
Innovation

Methods, ideas, or system contributions that make the work stand out.

Repetitive Concepts
In-Context Learning
Stability Enhancement
🔎 Similar Papers
No similar papers found.