Task Generalization With AutoRegressive Compositional Structure: Can Learning From $d$ Tasks Generalize to $d^{T}$ Tasks?

📅 2025-02-13
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper addresses the “few-shot training–large-scale compositional generalization” problem in task generalization: can a model trained on only $d$ base tasks generalize to a task family of size $d^T$, generated by $T$-step autoregressive composition (ARC)? We propose the ARC modeling framework and provide the first theoretical guarantee—showing that $ ilde{O}(d)$ task examples suffice for complete generalization. Empirically, we demonstrate that Transformers achieve exponential task generalization on sparse parity benchmarks via in-context learning (ICL) and chain-of-thought (CoT) reasoning, and successfully transfer this capability to arithmetic operations (addition, subtraction, multiplication, division) and multilingual translation—validating ARC’s cross-domain efficacy across logical, numerical, and semantic tasks. Our core contribution is the establishment of the first theoretical model supporting exponential task generalization, coupled with evidence that large language models implicitly learn ARC structure during training.

Technology Category

Application Category

📝 Abstract
Large language models (LLMs) exhibit remarkable task generalization, solving tasks they were never explicitly trained on with only a few demonstrations. This raises a fundamental question: When can learning from a small set of tasks generalize to a large task family? In this paper, we investigate task generalization through the lens of AutoRegressive Compositional (ARC) structure, where each task is a composition of $T$ operations, and each operation is among a finite family of $d$ subtasks. This yields a total class of size~( d^TT ). We first show that generalization to all ( d^TT ) tasks is theoretically achievable by training on only ( ilde{O}(d) ) tasks. Empirically, we demonstrate that Transformers achieve such exponential task generalization on sparse parity functions via in-context learning (ICL) and Chain-of-Thought (CoT) reasoning. We further demonstrate this generalization in arithmetic and language translation, extending beyond parity functions.
Problem

Research questions and friction points this paper is trying to address.

Generalization from few tasks to many
AutoRegressive Compositional structure analysis
Exponential task generalization in Transformers
Innovation

Methods, ideas, or system contributions that make the work stand out.

AutoRegressive Compositional structure
In-context learning adaptation
Chain-of-Thought reasoning application
🔎 Similar Papers
No similar papers found.
Amirhesam Abedsoltan
Amirhesam Abedsoltan
University of California SanDiego
Artificial IntelligenceMachine LearningDeep Learning
H
Huaqing Zhang
Institute for Interdisciplinary Information Sciences, Tsinghua University
Kaiyue Wen
Kaiyue Wen
Phd Student, Stanford University
Machine LearningNatural Language Processing
Hongzhou Lin
Hongzhou Lin
Amazon
Artificial IntelligenceLLMOptimizationTheory of Deep Learning
J
Jingzhao Zhang
Institute for Interdisciplinary Information Sciences, Tsinghua University
M
Mikhail Belkin
Department of Computer Science and Engineering, UC San Diego; Halicioglu Data Science Institute, UC San Diego