Compositional Reasoning with Transformers, RNNs, and Chain of Thought

📅 2025-03-03

📈 Citations: 0

✨ Influential: 0

career value

163K/year

🤖 AI Summary

This paper investigates the expressive power boundaries of Transformers, RNNs, and Chain-of-Thought (CoT)-augmented Transformers on compositional reasoning problems (CRPs)—including Boolean formula evaluation and multi-step word problems. Using a unified theoretical framework that integrates circuit complexity and communication complexity analyses, it establishes the first rigorous, asymptotically tight lower bounds on key architectural parameters required for universal CRP solvability: Ω(log n) depth for Transformers, Ω(log n) embedding dimension for RNNs (under adversarial input ordering), and Ω(n) CoT tokens for CoT-Transformers—where n denotes input size. The work further provides matching upper-bound constructions for each model, demonstrating their optimality. Crucially, it reveals that these architectures exhibit complementary—not hierarchical—reasoning capabilities, thereby establishing the first cross-architectural theoretical benchmark for CRP reasoning capacity.

Technology Category

Application Category

📝 Abstract

We study and compare the expressive power of transformers, RNNs, and transformers with chain of thought tokens on a simple and natural class of problems we term Compositional Reasoning Questions (CRQ). This family captures problems like evaluating Boolean formulas and multi-step word problems. Assuming standard hardness assumptions from circuit complexity and communication complexity, we prove that none of these three architectures is capable of solving CRQs unless some hyperparameter (depth, embedding dimension, and number of chain of thought tokens, respectively) grows with the size of the input. We also provide a construction for each architecture that solves CRQs. For transformers, our construction uses depth that is logarithmic in the problem size. For RNNs, logarithmic embedding dimension is necessary and sufficient, so long as the inputs are provided in a certain order. (Otherwise, a linear dimension is necessary). For transformers with chain of thought, our construction uses $n$ CoT tokens. These results show that, while CRQs are inherently hard, there are several different ways for language models to overcome this hardness. Even for a single class of problems, each architecture has strengths and weaknesses, and none is strictly better than the others.

Problem

Research questions and friction points this paper is trying to address.

Compare expressive power of transformers, RNNs, and chain of thought.

Analyze solving Compositional Reasoning Questions (CRQs) with different architectures.

Identify hyperparameter growth needed for solving CRQs effectively.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Transformers use logarithmic depth for CRQs.

RNNs require logarithmic embedding dimension.

Transformers with CoT use n tokens.

🔎 Similar Papers

No similar papers found.