🤖 AI Summary
This study investigates how large language models reason about cyclic concepts such as months and whether they inherently respect periodicity. By integrating causal abstraction, Fourier feature analysis, and neuron activation tracing, the authors identify a sparse set of 28 neurons—comprising only 0.2% of the MLP layer in Llama-3.1-8B—that collaboratively perform summation across diverse cyclic tasks. The findings reveal that the model does not directly execute modular arithmetic; instead, it first computes the sum of inputs using generic decimal addition and subsequently maps the result back into the cyclic space. This demonstrates that the model relies on arithmetic operations rather than conceptual periodicity for reasoning, thereby advancing our understanding of the relationship between internal mechanisms and representational geometry in language models.
📝 Abstract
Does structure in representations imply structure in computation? We study how Llama-3.1-8B reasons over cyclic concepts (e.g., "what month is six months after August?"). Even though Llama-3.1-8B's representations for these concepts are circularly structured, we find that instead of directly computing modular addition in the period of the cyclic concept (e.g., 12 for months), the model re-uses a generic addition mechanism across tasks that operates independently of concept-specific geometry. First, it computes the sum of its two inputs using base-10 addition (six + August=14). Then, it maps this sum back to cyclic concept space (14->February). We show that Llama-3.1-8B uses task-agnostic Fourier features to compute these sums--in fact, these features have periods that respect standard base-10 addition, e.g., 2, 5, and 10, rather than the cyclic concept period (e.g., 12 for months). Furthermore, we identify a sparse set of 28 MLP neurons re-used across all tasks (approximately 0.2% of the MLP at layer 18) that can be partitioned into disjoint clusters, each computing the sum for a Fourier feature with a different period. Our work highlights how an interplay between causal abstraction and feature geometry can deepen our mechanistic understanding of LMs.