Reversal Blessing: Thinking Backward May Outpace Thinking Forward in Multi-choice Questions

📅 2025-02-25
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work investigates whether right-to-left (R2L) autoregressive factorization improves multiple-choice question (MCQ) performance over the standard left-to-right (L2R) paradigm. Method: We conduct systematic experiments across 2B–8B parameter models, employing R2L autoregressive pretraining, controlled arithmetic simulation analysis, and evaluation on MCQ benchmarks spanning logical reasoning, commonsense understanding, and truth verification. Contribution/Results: We provide the first empirical evidence that R2L factorization significantly enhances knowledge extraction and reasoning capabilities in MCQ tasks. Our analysis identifies calibration, computability, and directional conditional entropy as key mechanistic determinants of factorization-direction efficacy. R2L models consistently outperform L2R baselines across diverse MCQ datasets, offering both theoretical insights into language model architecture design and practical advances for autoregressive modeling—challenging and extending the dominant L2R paradigm.

Technology Category

Application Category

📝 Abstract
Language models usually use left-to-right (L2R) autoregressive factorization. However, L2R factorization may not always be the best inductive bias. Therefore, we investigate whether alternative factorizations of the text distribution could be beneficial in some tasks. We investigate right-to-left (R2L) training as a compelling alternative, focusing on multiple-choice questions (MCQs) as a test bed for knowledge extraction and reasoning. Through extensive experiments across various model sizes (2B-8B parameters) and training datasets, we find that R2L models can significantly outperform L2R models on several MCQ benchmarks, including logical reasoning, commonsense understanding, and truthfulness assessment tasks. Our analysis reveals that this performance difference may be fundamentally linked to multiple factors including calibration, computability and directional conditional entropy. We ablate the impact of these factors through controlled simulation studies using arithmetic tasks, where the impacting factors can be better disentangled. Our work demonstrates that exploring alternative factorizations of the text distribution can lead to improvements in LLM capabilities and provides theoretical insights into optimal factorization towards approximating human language distribution, and when each reasoning order might be more advantageous.
Problem

Research questions and friction points this paper is trying to address.

Investigates alternative text distribution factorizations.
Compares right-to-left versus left-to-right model training.
Explores factorization impact on multi-choice question performance.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Right-to-left training approach
Enhanced multiple-choice question performance
Theoretical insights into factorization benefits
🔎 Similar Papers
No similar papers found.