๐ค AI Summary
This work investigates the internal mechanisms underlying arithmetic reasoning in large language models (LLMs), specifically examining whether numbers are represented and processed digit-wise (e.g., units, tens) and whether position-specific neural circuits exist. Combining feature attribution analysis with causal interventions, the study systematically identifies digit-position-specific neuron subgroups within MLP layers across multiple model scales and tokenization schemes. The key contribution is the first empirical demonstration of interpretable, compositional, and robust digit-position-specific circuitsโstable across model sizes and tokenization strategies. Targeted ablation and activation editing of these circuits causally alter predictions for corresponding digit positions, confirming their functional necessity in arithmetic computation. This work establishes a new paradigm for interpretable numerical reasoning research in LLMs, providing structured, mechanistic evidence for how positional digit information is encoded and utilized.
๐ Abstract
While recent work has begun to uncover the internal strategies that Large Language Models (LLMs) employ for simple arithmetic tasks, a unified understanding of their underlying mechanisms is still lacking. We extend recent findings showing that LLMs represent numbers in a digit-wise manner and present evidence for the existence of digit-position-specific circuits that LLMs use to perform simple arithmetic tasks, i.e. modular subgroups of MLP neurons that operate independently on different digit positions (units, tens, hundreds). Notably, such circuits exist independently of model size and of tokenization strategy, i.e. both for models that encode longer numbers digit-by-digit and as one token. Using Feature Importance and Causal Interventions, we identify and validate the digit-position-specific circuits, revealing a compositional and interpretable structure underlying the solving of arithmetic problems in LLMs. Our interventions selectively alter the model's prediction at targeted digit positions, demonstrating the causal role of digit-position circuits in solving arithmetic tasks.