RATIONALYST: Pre-training Process-Supervision for Improving Reasoning

📅 2024-10-01
🏛️ arXiv.org
📈 Citations: 3
Influential: 0
📄 PDF
🤖 AI Summary
Large language models (LLMs) often omit critical logical premises in their reasoning steps—mimicking informal linguistic shortcuts—resulting in incomplete, implicit reasoning chains. Method: We propose a process-oriented supervised pretraining paradigm that first extracts 79K high-quality rationale chains from unlabeled corpora (e.g., The Pile) via self-supervised mining to construct explicit logical grounding signals; we then perform dedicated pretraining on LLaMA-3-8B using these signals. This approach requires no human annotation or external verifiers. Contribution/Results: Our method significantly improves explicit modeling of intermediate reasoning premises. Evaluated on seven mainstream reasoning benchmarks, it achieves an average accuracy gain of +3.9% over strong baselines—including GPT-4-based verifiers and same-scale foundation models—demonstrating both the efficacy and generalizability of implicit-logic explicification pretraining.

Technology Category

Application Category

📝 Abstract
The reasoning steps generated by LLMs might be incomplete, as they mimic logical leaps common in everyday communication found in their pre-training data: underlying rationales are frequently left implicit (unstated). To address this challenge, we introduce RATIONALYST, a model for process-supervision of reasoning based on pre-training on a vast collection of rationale annotations extracted from unlabeled data. We extract 79k rationales from web-scale unlabelled dataset (the Pile) and a combination of reasoning datasets with minimal human intervention. This web-scale pre-training for reasoning allows RATIONALYST to consistently generalize across diverse reasoning tasks, including mathematical, commonsense, scientific, and logical reasoning. Fine-tuned from LLaMa-3-8B, RATIONALYST improves the accuracy of reasoning by an average of 3.9% on 7 representative reasoning benchmarks. It also demonstrates superior performance compared to significantly larger verifiers like GPT-4 and similarly sized models fine-tuned on matching training sets.
Problem

Research questions and friction points this paper is trying to address.

Addresses incomplete reasoning steps in LLMs by mining implicit rationales
Improves reasoning accuracy via web-scale pre-training on extracted rationales
Generalizes across diverse reasoning tasks with minimal human intervention
Innovation

Methods, ideas, or system contributions that make the work stand out.

Mining implicit rationales from unlabeled data
Web-scale pre-training for diverse reasoning tasks
Fine-tuning LLaMa-3-8B to improve reasoning accuracy