Markovian Transformers for Informative Language Modeling

📅 2024-04-29
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Large language models (LLMs) exhibit weak interpretability in chain-of-thought (CoT) reasoning and often rely on the original prompt rather than intermediate reasoning steps for final predictions. Method: This paper proposes a causally necessary CoT mechanism that enforces each step’s prediction to be explicit and solely dependent on preceding reasoning text—rendering CoT a Markovian necessary mediator. We design an “informativeness”-based objective to guide training and develop a Markovian Transformer architecture optimized via policy gradients, implemented on Llama 3.1 8B with conditional independence modeling and perturbation-robust training. Contributions/Results: Our approach achieves an absolute +33.2% accuracy gain on GSM8K; perturbation analysis validates the strong causal necessity of CoT; and reasoning trajectories successfully transfer across models, significantly improving generalization and transparency.

Technology Category

Application Category

📝 Abstract
Chain-of-Thought (CoT) reasoning often fails to faithfully reflect a language model's underlying decision process. We address this by making CoT text causally essential in a"Markovian"language model, factoring next-token prediction through an intermediate CoT and training it to predict future tokens independently of the original prompt. We formalize this via an"informativeness"objective that quantifies how much a trained CoT improves next-token predictions over a baseline. Using policy gradient, we show that Llama 3.1 8B achieves a 33.2% absolute accuracy improvement on GSM8K. Perturbation tests confirm stronger reliance on the CoT, while cross-model transfers indicate these reasoning traces generalize across interpreters. Our approach enhances both accuracy and interpretability, potentially extending CoT reasoning to arbitrarily long contexts and diverse tasks.
Problem

Research questions and friction points this paper is trying to address.

Language Models
Explainability
Chained Reasoning
Innovation

Methods, ideas, or system contributions that make the work stand out.

Chain-of-Thought (CoT) Enhancement
Predictive Accuracy Improvement
Explainable Reasoning
🔎 Similar Papers
No similar papers found.