(How) Do Language Models Track State?

📅 2025-03-04

📈 Citations: 0

✨ Influential: 0

career value

191K/year

🤖 AI Summary

This work investigates how Transformer language models implicitly track unobserved state changes in dynamic environments, using permutation synthesis as a benchmark task. To characterize this capability, we formalize the problem via permutation algebra and conduct mechanistic interpretability analyses. We discover— for the first time—that both pretrained and fine-tuned models consistently acquire two distinct, robust state-tracking mechanisms: associative scanning and parity-guided hybrid strategies. Building on this finding, we propose an intermediate-task-guided training framework that enables controlled induction and clean separation of these mechanisms. Experiments demonstrate that our method achieves high accuracy on permutation synthesis while yielding mechanisms that are highly predictable and linearly separable in representation space. This provides a novel, empirically grounded perspective on how language models perform implicit state modeling, along with a verifiable framework for probing and steering such capabilities.

Technology Category

Application Category

📝 Abstract

Transformer language models (LMs) exhibit behaviors -- from storytelling to code generation -- that appear to require tracking the unobserved state of an evolving world. How do they do so? We study state tracking in LMs trained or fine-tuned to compose permutations (i.e., to compute the order of a set of objects after a sequence of swaps). Despite the simple algebraic structure of this problem, many other tasks (e.g., simulation of finite automata and evaluation of boolean expressions) can be reduced to permutation composition, making it a natural model for state tracking in general. We show that LMs consistently learn one of two state tracking mechanisms for this task. The first closely resembles the"associative scan"construction used in recent theoretical work by Liu et al. (2023) and Merrill et al. (2024). The second uses an easy-to-compute feature (permutation parity) to partially prune the space of outputs, then refines this with an associative scan. The two mechanisms exhibit markedly different robustness properties, and we show how to steer LMs toward one or the other with intermediate training tasks that encourage or suppress the heuristics. Our results demonstrate that transformer LMs, whether pretrained or fine-tuned, can learn to implement efficient and interpretable state tracking mechanisms, and the emergence of these mechanisms can be predicted and controlled.

Problem

Research questions and friction points this paper is trying to address.

How transformer LMs track unobserved state in evolving tasks.

Study state tracking via permutation composition in LMs.

Identify and control two distinct state tracking mechanisms.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Transformer LMs learn state tracking mechanisms.

Mechanisms include associative scan and permutation parity.

Training tasks steer LMs toward specific mechanisms.

🔎 Similar Papers

No similar papers found.