Arithmetic in the Wild: Llama uses Base-10 Addition to Reason About Cyclic Concepts

📅 2026-05-01
📈 Citations: 0
Influential: 0
📄 PDF

career value

232K/year
🤖 AI Summary
This study investigates how large language models reason about cyclic concepts such as months and whether they inherently respect periodicity. By integrating causal abstraction, Fourier feature analysis, and neuron activation tracing, the authors identify a sparse set of 28 neurons—comprising only 0.2% of the MLP layer in Llama-3.1-8B—that collaboratively perform summation across diverse cyclic tasks. The findings reveal that the model does not directly execute modular arithmetic; instead, it first computes the sum of inputs using generic decimal addition and subsequently maps the result back into the cyclic space. This demonstrates that the model relies on arithmetic operations rather than conceptual periodicity for reasoning, thereby advancing our understanding of the relationship between internal mechanisms and representational geometry in language models.
📝 Abstract
Does structure in representations imply structure in computation? We study how Llama-3.1-8B reasons over cyclic concepts (e.g., "what month is six months after August?"). Even though Llama-3.1-8B's representations for these concepts are circularly structured, we find that instead of directly computing modular addition in the period of the cyclic concept (e.g., 12 for months), the model re-uses a generic addition mechanism across tasks that operates independently of concept-specific geometry. First, it computes the sum of its two inputs using base-10 addition (six + August=14). Then, it maps this sum back to cyclic concept space (14->February). We show that Llama-3.1-8B uses task-agnostic Fourier features to compute these sums--in fact, these features have periods that respect standard base-10 addition, e.g., 2, 5, and 10, rather than the cyclic concept period (e.g., 12 for months). Furthermore, we identify a sparse set of 28 MLP neurons re-used across all tasks (approximately 0.2% of the MLP at layer 18) that can be partitioned into disjoint clusters, each computing the sum for a Fourier feature with a different period. Our work highlights how an interplay between causal abstraction and feature geometry can deepen our mechanistic understanding of LMs.
Problem

Research questions and friction points this paper is trying to address.

cyclic concepts
modular addition
representation structure
computation mechanism
large language models
Innovation

Methods, ideas, or system contributions that make the work stand out.

modular reasoning
Fourier features
base-10 addition
sparse MLP neurons
cyclic concepts
🔎 Similar Papers
No similar papers found.