Understanding the Generalization of In-Context Learning in Transformers: An Empirical Study

📅 2025-03-19
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the poorly understood generalization boundaries and pronounced cross-task fragility of large language models (e.g., GPT-4, LLaMA-3) in in-context learning (ICL). We propose the first task-centric, three-dimensional ICL generalization framework—spanning cross-problem, intra-problem, and intra-task levels—and conduct systematic empirical evaluation across diverse tasks including function fitting, API calling, and translation. Our findings reveal that Transformers exhibit strong intra-task and intra-problem generalization but severely lack cross-problem generalization. Crucially, increasing task diversity in training data significantly improves ICL generalization to unseen tasks; task-mixed training yields up to a 37% absolute accuracy gain. These results provide actionable, theory-grounded guidance for ICL data curation and model pretraining, along with a unified evaluation protocol for rigorous benchmarking of ICL generalization capabilities.

Technology Category

Application Category

📝 Abstract
Large language models (LLMs) like GPT-4 and LLaMA-3 utilize the powerful in-context learning (ICL) capability of Transformer architecture to learn on the fly from limited examples. While ICL underpins many LLM applications, its full potential remains hindered by a limited understanding of its generalization boundaries and vulnerabilities. We present a systematic investigation of transformers' generalization capability with ICL relative to training data coverage by defining a task-centric framework along three dimensions: inter-problem, intra-problem, and intra-task generalization. Through extensive simulation and real-world experiments, encompassing tasks such as function fitting, API calling, and translation, we find that transformers lack inter-problem generalization with ICL, but excel in intra-task and intra-problem generalization. When the training data includes a greater variety of mixed tasks, it significantly enhances the generalization ability of ICL on unseen tasks and even on known simple tasks. This guides us in designing training data to maximize the diversity of tasks covered and to combine different tasks whenever possible, rather than solely focusing on the target task for testing.
Problem

Research questions and friction points this paper is trying to address.

Investigates generalization boundaries of in-context learning in transformers.
Explores inter-problem, intra-problem, and intra-task generalization capabilities.
Identifies training data diversity as key to enhancing ICL generalization.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Systematic investigation of ICL generalization boundaries
Task-centric framework for three generalization dimensions
Enhanced ICL generalization with diverse training data
🔎 Similar Papers
No similar papers found.
Xingxuan Zhang
Xingxuan Zhang
Postdoctoral Research Scientist at Department of Computer Science, Tsinghua University
computer visionOOD GeneralizationDomain GeneralizationOptimization
H
Haoran Wang
Tsinghua University
J
Jiansheng Li
Tsinghua University
Y
Yuan Xue
Tsinghua University
S
Shikai Guan
Tsinghua University
Renzhe Xu
Renzhe Xu
Assistant Professor of Computer Science, Shanghai University of Finance and Economics
Algorithmic Game TheorySequential Decision Making
H
Hao Zou
Tsinghua University
H
Han Yu
Tsinghua University
P
Peng Cui
Tsinghua University