Task Generalization With AutoRegressive Compositional Structure: Can Learning From $d$ Tasks Generalize to $d^{T}$ Tasks?

📅 2025-02-13

📈 Citations: 0

✨ Influential: 0

career value

184K/year

🤖 AI Summary

This paper addresses the “few-shot training–large-scale compositional generalization” problem in task generalization: can a model trained on only $d$ base tasks generalize to a task family of size $d^T$, generated by $T$-step autoregressive composition (ARC)? We propose the ARC modeling framework and provide the first theoretical guarantee—showing that $ ilde{O}(d)$ task examples suffice for complete generalization. Empirically, we demonstrate that Transformers achieve exponential task generalization on sparse parity benchmarks via in-context learning (ICL) and chain-of-thought (CoT) reasoning, and successfully transfer this capability to arithmetic operations (addition, subtraction, multiplication, division) and multilingual translation—validating ARC’s cross-domain efficacy across logical, numerical, and semantic tasks. Our core contribution is the establishment of the first theoretical model supporting exponential task generalization, coupled with evidence that large language models implicitly learn ARC structure during training.

Technology Category

Application Category

📝 Abstract

Large language models (LLMs) exhibit remarkable task generalization, solving tasks they were never explicitly trained on with only a few demonstrations. This raises a fundamental question: When can learning from a small set of tasks generalize to a large task family? In this paper, we investigate task generalization through the lens of AutoRegressive Compositional (ARC) structure, where each task is a composition of $T$ operations, and each operation is among a finite family of $d$ subtasks. This yields a total class of size~( d^TT ). We first show that generalization to all ( d^TT ) tasks is theoretically achievable by training on only ( ilde{O}(d) ) tasks. Empirically, we demonstrate that Transformers achieve such exponential task generalization on sparse parity functions via in-context learning (ICL) and Chain-of-Thought (CoT) reasoning. We further demonstrate this generalization in arithmetic and language translation, extending beyond parity functions.

Problem

Research questions and friction points this paper is trying to address.

Generalization from few tasks to many

AutoRegressive Compositional structure analysis

Exponential task generalization in Transformers

Innovation

Methods, ideas, or system contributions that make the work stand out.

AutoRegressive Compositional structure

In-context learning adaptation

Chain-of-Thought reasoning application

🔎 Similar Papers

Can Optimization Trajectories Explain Multi-Task Transfer?