Unravelling the Mechanisms of Manipulating Numbers in Language Models

📅 2025-10-30
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Large language models (LLMs) exhibit a paradoxical phenomenon in numerical processing—internal representations of numbers are highly accurate, yet final outputs are frequently erroneous. Method: We conduct cross-model hidden-state analysis and employ transferable probing techniques to characterize digital embeddings across architectures; we localize error sources by layer-wise intervention and attribution in attention and feed-forward networks. Contribution/Results: We discover that numeric embeddings are systematically encoded with high fidelity across layers and model families—far exceeding output accuracy—and identify higher-layer attention and FFN modules as primary loci of relational miscomputation, not low-level encoding distortion. We quantify the layer-wise evolution of numeric representation accuracy and establish its theoretical lower bound. Furthermore, we design a general-purpose probe enabling precise error attribution to specific transformer layers. This work provides the first interpretable mechanistic framework for understanding LLMs’ numerical reasoning bottlenecks and offers concrete directions for architectural refinement and numerical robustness enhancement.

Technology Category

Application Category

📝 Abstract
Recent work has shown that different large language models (LLMs) converge to similar and accurate input embedding representations for numbers. These findings conflict with the documented propensity of LLMs to produce erroneous outputs when dealing with numeric information. In this work, we aim to explain this conflict by exploring how language models manipulate numbers and quantify the lower bounds of accuracy of these mechanisms. We find that despite surfacing errors, different language models learn interchangeable representations of numbers that are systematic, highly accurate and universal across their hidden states and the types of input contexts. This allows us to create universal probes for each LLM and to trace information -- including the causes of output errors -- to specific layers. Our results lay a fundamental understanding of how pre-trained LLMs manipulate numbers and outline the potential of more accurate probing techniques in addressed refinements of LLMs' architectures.
Problem

Research questions and friction points this paper is trying to address.

Explaining conflicts in LLM number representation accuracy
Quantifying lower bounds of numerical manipulation mechanisms
Identifying error causes through universal probing techniques
Innovation

Methods, ideas, or system contributions that make the work stand out.

Probes trace numeric errors to specific model layers
Universal probes analyze interchangeable number representations
Systematic representations enable accurate numeric manipulation
🔎 Similar Papers
No similar papers found.