Investigation into In-Context Learning Capabilities of Transformers

📅 2026-04-28

📈 Citations: 0

✨ Influential: 0

career value

160K/year

🤖 AI Summary

This work systematically investigates the conditions under which Transformers succeed at in-context learning (ICL), focusing on binary classification tasks derived from high-dimensional Gaussian mixtures. Through controlled synthetic experiments within a linear contextual classifier framework and extensive grid searches, the study analyzes how input dimensionality, context length, signal-to-noise ratio, and diversity of pretraining tasks jointly influence ICL performance. It establishes the first empirical scaling laws for ICL across varying data geometries and degrees of task exposure during pretraining, elucidating the emergence of benign overfitting and its connection to the inferability of underlying task structure. The findings demonstrate that effective ICL critically depends on high input dimensionality, strong signal strength, and sufficient diversity in pretraining tasks.

📝 Abstract

Transformers have demonstrated a strong ability for in-context learning (ICL), enabling models to solve previously unseen tasks using only example input output pairs provided at inference time. While prior theoretical work has established conditions under which transformers can perform linear classification in-context, the empirical scaling behavior governing when this mechanism succeeds remains insufficiently characterized. In this paper, we conduct a systematic empirical study of in-context learning for Gaussian-mixture binary classification tasks. Building on the theoretical framework of Frei and Vardi (2024), we analyze how in-context test accuracy depends on three fundamental factors: the input dimension, the number of in-context examples, and the number of pre-training tasks. Using a controlled synthetic setup and a linear in-context classifier formulation, we isolate the geometric conditions under which models successfully infer task structure from context alone. We additionally investigate the emergence of benign overfitting, where models memorize noisy in-context labels while still achieving strong generalization performance on clean test data. Through extensive sweeps across dimensionality, sequence length, task diversity, and signal-to-noise regimes, we identify the parameter regions in which this phenomenon arises and characterize how it depends on data geometry and training exposure. Our results provide a comprehensive empirical map of scaling behavior in in-context classification, highlighting the critical role of dimensionality, signal strength, and contextual information in determining when in-context learning succeeds and when it fails.

Problem

Research questions and friction points this paper is trying to address.

in-context learning

transformers

scaling behavior

benign overfitting

Gaussian-mixture classification

Innovation

Methods, ideas, or system contributions that make the work stand out.

in-context learning

benign overfitting

scaling behavior