A Theory of Learning with Autoregressive Chain of Thought

📅 2025-03-11

📈 Citations: 0

✨ Influential: 0

career value

189K/year

🤖 AI Summary

This paper investigates the learning theory of time-invariant chains-of-thought (CoT) constructed iteratively via a fixed autoregressive generator, under two settings: explicit supervision (with full CoT sequences observable) and implicit learning (given only prompt-answer pairs, where CoT is a latent variable). Method: We establish, for the first time, chain-length-independent generalization bounds using VC-dimension analysis and linear-threshold modeling, deriving tight sample and computational complexity upper bounds; we further prove that attention mechanisms emerge naturally and construct a basis class that simultaneously ensures universal representation capability and computational efficiency. Contributions/Results: (1) The first generalization bound for time-invariant CoT with sample complexity independent of chain length; (2) A unified theoretical explanation for both CoT learnability and the emergence of attention; (3) A theoretically consistent framework bridging explicit supervision and implicit learning.

Technology Category

Application Category

📝 Abstract

For a given base class of sequence-to-next-token generators, we consider learning prompt-to-answer mappings obtained by iterating a fixed, time-invariant generator for multiple steps, thus generating a chain-of-thought, and then taking the final token as the answer. We formalize the learning problems both when the chain-of-thought is observed and when training only on prompt-answer pairs, with the chain-of-thought latent. We analyze the sample and computational complexity both in terms of general properties of the base class (e.g. its VC dimension) and for specific base classes such as linear thresholds. We present a simple base class that allows for universal representability and computationally tractable chain-of-thought learning. Central to our development is that time invariance allows for sample complexity that is independent of the length of the chain-of-thought. Attention arises naturally in our construction.

Problem

Research questions and friction points this paper is trying to address.

Learning prompt-to-answer mappings using autoregressive chain-of-thought.

Analyzing sample and computational complexity for different base classes.

Developing a base class enabling universal representability and tractable learning.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Autoregressive chain-of-thought learning approach

Time-invariant generator for sequence-to-token mapping

Universal representability with computationally tractable learning

🔎 Similar Papers

An MRP Formulation for Supervised Learning: Generalized Temporal Difference Learning Models

2024-04-23arXiv.orgCitations: 0

Bosch Group

Renningen, BW, DE

Master Thesis Reinforcement Learning for Behavior Planning in Automated Driving

Bosch Group

Renningen, BW, DE

Authors to Follow