Unlabeled Data Can Provably Enhance In-Context Learning of Transformers

📅 2026-01-15

📈 Citations: 1

✨ Influential: 0

career value

160K/year

🤖 AI Summary

This work addresses how to leverage abundant unlabeled data to enhance in-context learning (ICL) performance in large language models. The authors propose an augmented ICL framework that integrates a small number of labeled examples with a large pool of unlabeled inputs within the prompt. By employing chain-of-thought (CoT) prompting, the framework guides the multi-layer Transformer architecture to implicitly implement the Expectation–Maximization (EM) algorithm, enabling effective collaboration between labeled and unlabeled data. Theoretical analysis provides the first formal proof that this mechanism substantially improves ICL accuracy and ensures model parameters converge linearly to the optimal solution. Experimental results consistently demonstrate superior performance over conventional few-shot ICL methods across multiple linear classification tasks.

Technology Category

Application Category

📝 Abstract

Large language models (LLMs) exhibit impressive in-context learning (ICL) capabilities, yet the quality of their predictions is fundamentally limited by the few costly labeled demonstrations that can fit into a prompt. Meanwhile, there exist vast and continuously growing amounts of unlabeled data that may be closely related to the ICL task. How to utilize such unlabeled data to provably enhance the performance of ICL thus becomes an emerging fundamental question. In this work, we propose a novel augmented ICL framework, in which the prompt includes a small set of labeled examples alongside a block of unlabeled inputs. We focus on the multi-class linear classification setting and demonstrate that, with chain-of-thought (CoT) prompting, a multi-layer transformer can effectively emulate an expectation-maximization (EM) algorithm. This enables the transformer to implicitly extract useful information from both labeled and unlabeled data, leading to provable improvements in ICL accuracy. Moreover, we show that such a transformer can be trained via teacher forcing, with its parameters converging to the desired solution at a linear rate. Experiments demonstrate that the augmented ICL framework consistently outperforms conventional few-shot ICL, providing empirical support for our theoretical findings. To the best of our knowledge, this is the first theoretical study on the impact of unlabeled data on the ICL performance of transformers.

Problem

Research questions and friction points this paper is trying to address.

in-context learning

unlabeled data

transformers

few-shot learning

multi-class classification

Innovation

Methods, ideas, or system contributions that make the work stand out.

in-context learning

unlabeled data

expectation-maximization