🤖 AI Summary
Current large language models (LLMs) are constrained by the autoregressive paradigm and cannot perform token-level counterfactual reasoning—e.g., “how would subsequent text change if a given token were replaced?” This work introduces the first Gumbel-Max–based, token-level structural causal model for plug-and-play counterfactual text generation—requiring neither fine-tuning nor prompt engineering. Our method models causal dependencies among tokens via explicit causal interventions, enabling direct counterfactual edits to generated sequences. Evaluated on Llama-3-8B-Instruct and Ministral-8B-Instruct, it demonstrates strong empirical validity: counterfactual outputs are both qualitatively coherent and quantitatively robust under automated metrics. Moreover, the approach successfully detects latent biases, exposing underlying worldviews and decision preferences embedded in LLMs. By endowing LLMs with stateful, intervention-aware reasoning capabilities, this work overcomes the fundamental statelessness of standard autoregressive models and establishes a novel paradigm for controllable, interpretable, and causally grounded text generation.
📝 Abstract
"Sure, I am happy to generate a story for you: Captain Lyra stood at the helm of her trusty ship, the Maelstrom's Fury, gazing out at the endless sea. [...] Lyra's eyes welled up with tears as she realized the bitter truth - she had sacrificed everything for fleeting riches, and lost the love of her crew, her family, and herself."Although this story, generated by a large language model, is captivating, one may wonder -- how would the story have unfolded if the model had chosen"Captain Maeve"as the protagonist instead? We cannot know. State-of-the-art large language models are stateless -- they maintain no internal memory or state. Given a prompt, they generate a sequence of tokens as an output using an autoregressive process. As a consequence, they cannot reason about counterfactual alternatives to tokens they have generated in the past. In this work, our goal is to enhance them with this functionality. To this end, we develop a causal model of token generation that builds upon the Gumbel-Max structural causal model. Our model allows any large language model to perform counterfactual token generation at almost no cost in comparison with vanilla token generation, it is embarrassingly simple to implement, and it does not require any fine-tuning nor prompt engineering. We implement our model on Llama 3 8B-Instruct and Ministral-8B-Instruct and conduct a qualitative and a quantitative analysis of counterfactually generated text. We conclude with a demonstrative application of counterfactual token generation for bias detection, unveiling interesting insights about the model of the world constructed by large language models.