What do language models model? Transformers, automata, and the format of thought

📅 2025-08-25

📈 Citations: 0

✨ Influential: 0

career value

166K/year

🤖 AI Summary

What do large language models (LLMs) fundamentally model—human cognitive capacities or merely statistical regularities in training corpora? This work addresses this question by analyzing the computational essence of the Transformer architecture, revealing its intrinsic limitation to linear computation and inability to capture the superlinear cognitive properties of human language. To formalize LLMs’ context-sensitive generation mechanism, we propose the “discourse machine” paradigm, grounded in shortcut automata. Integrating formal language theory, computational invariance analysis, and cognitive science perspectives, we rigorously demonstrate that LLMs primarily encode surface-level statistical patterns in training data, with their capability boundaries stemming from architectural computational constraints. Crucially, this is the first systematic application of shortcut automata theory to LLM interpretability research, yielding a novel theoretical framework for understanding both generative mechanisms and fundamental limits of LLMs.

Technology Category

Application Category

📝 Abstract

What do large language models actually model? Do they tell us something about human capacities, or are they models of the corpus we've trained them on? I give a non-deflationary defence of the latter position. Cognitive science tells us that linguistic capabilities in humans rely supralinear formats for computation. The transformer architecture, by contrast, supports at best a linear formats for processing. This argument will rely primarily on certain invariants of the computational architecture of transformers. I then suggest a positive story about what transformers are doing, focusing on Liu et al. (2022)'s intriguing speculations about shortcut automata. I conclude with why I don't think this is a terribly deflationary story. Language is not (just) a means for expressing inner state but also a kind of 'discourse machine' that lets us make new language given appropriate context. We have learned to use this technology in one way; LLMs have also learned to use it too, but via very different means.

Problem

Research questions and friction points this paper is trying to address.

Investigating what large language models actually model

Comparing transformer architecture to human cognitive formats

Exploring LLMs as models of corpus not human cognition

Innovation

Methods, ideas, or system contributions that make the work stand out.

Transformers use linear computation formats

Models shortcut automata for processing

Leverage discourse machine via different means

🔎 Similar Papers

A mathematical perspective on Transformers