Markovian Transformers for Informative Language Modeling

📅 2024-04-29

📈 Citations: 0

✨ Influential: 0

career value

168K/year

🤖 AI Summary

Large language models (LLMs) exhibit weak interpretability in chain-of-thought (CoT) reasoning and often rely on the original prompt rather than intermediate reasoning steps for final predictions. Method: This paper proposes a causally necessary CoT mechanism that enforces each step’s prediction to be explicit and solely dependent on preceding reasoning text—rendering CoT a Markovian necessary mediator. We design an “informativeness”-based objective to guide training and develop a Markovian Transformer architecture optimized via policy gradients, implemented on Llama 3.1 8B with conditional independence modeling and perturbation-robust training. Contributions/Results: Our approach achieves an absolute +33.2% accuracy gain on GSM8K; perturbation analysis validates the strong causal necessity of CoT; and reasoning trajectories successfully transfer across models, significantly improving generalization and transparency.

Technology Category

Application Category

📝 Abstract

Chain-of-Thought (CoT) reasoning often fails to faithfully reflect a language model's underlying decision process. We address this by making CoT text causally essential in a"Markovian"language model, factoring next-token prediction through an intermediate CoT and training it to predict future tokens independently of the original prompt. We formalize this via an"informativeness"objective that quantifies how much a trained CoT improves next-token predictions over a baseline. Using policy gradient, we show that Llama 3.1 8B achieves a 33.2% absolute accuracy improvement on GSM8K. Perturbation tests confirm stronger reliance on the CoT, while cross-model transfers indicate these reasoning traces generalize across interpreters. Our approach enhances both accuracy and interpretability, potentially extending CoT reasoning to arbitrarily long contexts and diverse tasks.

Problem

Research questions and friction points this paper is trying to address.

Language Models

Explainability

Chained Reasoning

Innovation

Methods, ideas, or system contributions that make the work stand out.

Chain-of-Thought (CoT) Enhancement

Predictive Accuracy Improvement

Explainable Reasoning

🔎 Similar Papers

No similar papers found.