Arithmetic in Transformers Explained

📅 2024-02-04
📈 Citations: 4
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the low accuracy and poor generalization of Transformers on arithmetic tasks by systematically investigating their internal mechanisms for addition, subtraction, and mixed arithmetic. Through mechanistic interpretability analysis across 44 autoregressive Transformer models, we identify critical attention heads and hierarchical circuits, and develop algorithm-level visualization tools. We discover—firstly—that addition models converge to a unified logical algorithm; secondly, that parameter transfer drives semantic representations to evolve from monosemantic to polysemantic, substantially improving subtraction performance; and thirdly, we release a reusable mechanistic interpretability toolkit. Experiments demonstrate: (i) addition accuracy exceeding 99.999%; (ii) significant gains in subtraction accuracy via mixed-operation initialization training; and (iii) full circuit-level decomposition of neural implementations for both addition and mixed arithmetic.

Technology Category

Application Category

📝 Abstract
While recent work has shown transformers can learn addition, previous models exhibit poor prediction accuracy and are limited to small numbers. Furthermore, the relationship between single-task and multitask arithmetic capabilities remains unexplored. In this work, we analyze 44 autoregressive transformer models trained on addition, subtraction, or both. These include 16 addition-only models, 2 subtraction-only models, 8"mixed"models trained to perform addition and subtraction, and 14 mixed models initialized with parameters from an addition-only model. The models span 5- to 15-digit questions, 2 to 4 attention heads, and 2 to 3 layers. We show that the addition models converge on a common logical algorithm, with most models achieving>99.999% prediction accuracy. We provide a detailed mechanistic explanation of how this algorithm is implemented within the network architecture. Subtraction-only models have lower accuracy. With the initialized mixed models, through parameter transfer experiments, we explore how multitask learning dynamics evolve, revealing that some features originally specialized for addition become polysemantic, serving both operations, and boosting subtraction accuracy. We explain the mixed algorithm mechanically. Finally, we introduce a reusable library of mechanistic interpretability tools to define, locate, and visualize these algorithmic circuits across multiple models.
Problem

Research questions and friction points this paper is trying to address.

Transformers' poor accuracy in arithmetic tasks
Unclear relationship between single-task and multitask arithmetic
Mechanistic explanation of arithmetic in transformer models
Innovation

Methods, ideas, or system contributions that make the work stand out.

Autoregressive transformer models
Mechanistic interpretability tools
Parameter transfer experiments
🔎 Similar Papers
No similar papers found.