🤖 AI Summary
This work investigates how Transformers intrinsically implement rule-driven dialogue behaviors, using the classic ELIZA system as an interpretable testbed. Method: We establish, for the first time, a rigorous formal correspondence between Transformers and rule-based dialogue systems—constructing a theoretically verifiable implementation grounded in finite automata, generating synthetic ELIZA dialogues, and applying mechanistic analysis techniques including induction head detection and hidden-state trajectory tracing. Contribution/Results: We find that Transformers do not rely on exact positional copying; instead, they preferentially leverage induction heads and mid-layer hidden states to emulate recursive structures (e.g., implicit scratchpads). We formally prove that Transformers can exactly realize ELIZA’s transformation logic. Empirically, we observe that models spontaneously develop implicit, chain-of-thought–like reasoning pathways. This work establishes a novel paradigm and formal foundation for mechanistic interpretability in dialogue systems.
📝 Abstract
What kind of internal mechanisms might Transformers use to conduct fluid, natural-sounding conversations? Prior work has illustrated by construction how Transformers can solve various synthetic tasks, such as sorting a list or recognizing formal languages, but it remains unclear how to extend this approach to a conversational setting. In this work, we propose using ELIZA, a classic rule-based chatbot, as a setting for formal, mechanistic analysis of Transformer-based chatbots. ELIZA allows us to formally model key aspects of conversation, including local pattern matching and long-term dialogue state tracking. We first present a theoretical construction of a Transformer that implements the ELIZA chatbot. Building on prior constructions, particularly those for simulating finite-state automata, we show how simpler mechanisms can be composed and extended to produce more sophisticated behavior. Next, we conduct a set of empirical analyses of Transformers trained on synthetically generated ELIZA conversations. Our analysis illustrates the kinds of mechanisms these models tend to prefer--for example, models favor an induction head mechanism over a more precise, position-based copying mechanism; and using intermediate generations to simulate recurrent data structures, akin to an implicit scratchpad or Chain-of-Thought. Overall, by drawing an explicit connection between neural chatbots and interpretable, symbolic mechanisms, our results provide a new framework for the mechanistic analysis of conversational agents.