Reliable Reasoning Beyond Natural Language

📅 2024-07-16
🏛️ arXiv.org
📈 Citations: 8
Influential: 0
📄 PDF
🤖 AI Summary
Large language models (LLMs) struggle with nonlinear reasoning—such as iterative refinement, backtracking, and parallel chain-of-thought—under standard autoregressive text generation, leading to unreliable mathematical and logical inference. To address this, we propose a neuro-symbolic collaboration framework: an LLM automatically compiles natural-language problems into Prolog logic programs, which are then executed by a symbolic engine to perform verifiable, explicit deductive reasoning. Our key contributions include: (1) the first benchmark dataset for Nonlinear Reasoning (NLR); (2) a prompt-driven mechanism for logic program generation; and (3) an end-to-end traceable reasoning pipeline. Experiments demonstrate substantial improvements over strong baselines—including GPT-4—on GSM8k, BIG-bench Navigate, and our NLR benchmark, particularly achieving high accuracy on zero-shot “original solution” tasks. This work establishes a new reasoning paradigm that is logically verifiable and stepwise auditable.

Technology Category

Application Category

📝 Abstract
Despite their linguistic competence, Large Language models (LLMs) often exhibit limitations in their ability to reason reliably and flexibly. To address this, we propose a neurosymbolic approach that prompts LLMs to extract and encode all relevant information from a problem statement as logical code statements, and then use a logic programming language (Prolog) to conduct the iterative computations of explicit deductive reasoning. Our approach significantly enhances the performance of LLMs on the standard mathematical reasoning benchmark, GSM8k, and the Navigate dataset from the BIG-bench dataset. Additionally, we introduce a novel dataset, the Non-Linear Reasoning (NLR) dataset, consisting of 55 unique word problems that target the shortcomings of the next token prediction paradigm of LLMs and require complex non-linear reasoning but only basic arithmetic skills to solve. Our findings demonstrate that the integration of Prolog enables LLMs to achieve high performance on the NLR dataset, which even the most advanced language models (including GPT4) fail to solve using text only.
Problem

Research questions and friction points this paper is trying to address.

Addresses LLMs' unreliable reasoning beyond natural language limitations.
Introduces a neurosymbolic approach integrating Prolog for robust reasoning.
Solves iterative, backtracking tasks with high accuracy on benchmarks.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Integrates Prolog symbolic engine with LLMs
Shifts LLMs to infer and encode logical code
Achieves robust performance on complex reasoning tasks