Are Language Models Efficient Reasoners? A Perspective from Logic Programming

📅 2025-10-29

📈 Citations: 0

✨ Influential: 0

career value

180K/year

🤖 AI Summary

Despite possessing deductive correctness, current language models suffer from poor reasoning efficiency—particularly in realistic scenarios containing irrelevant information, where they fail to identify and disregard semantic distractors, leading to redundant derivations. Method: We propose the first logic-programming–based framework for evaluating reasoning efficiency: (i) aligning natural-language proofs with minimal logical proofs to quantify path conciseness; (ii) constructing a mathematical word-problem dataset with controllable semantic distractor axioms; and (iii) using a logic executor to generate shortest proofs as efficiency baselines. Results: Experiments show that mainstream models exhibit significant accuracy degradation under minimal semantic interference, and their generated proofs consistently include irrelevant steps—empirically confirming inefficient deduction. This work is the first to systematically uncover and quantitatively assess deductive redundancy in large language models.

Technology Category

Application Category

📝 Abstract

Modern language models (LMs) exhibit strong deductive reasoning capabilities, yet standard evaluations emphasize correctness while overlooking a key aspect of human-like reasoning: efficiency. In real-world reasoning scenarios, much of the available information is irrelevant, and effective deductive inference requires identifying and ignoring such distractions. We propose a framework for assessing LM reasoning efficiency through the lens of logic programming, introducing a simple method to align proofs written in natural language -- as generated by an LM -- with shortest proofs found by executing the logic program. Efficiency is quantified by measuring how well a model avoids unnecessary inference. Empirically, we construct a dataset of math word problems injected with various number of irrelevant axioms that vary in semantic overlap with the goal theorem. We find that current LMs show marked accuracy declines under such conditions -- even with minimal, domain-consistent distractions -- and the proofs they generate frequently exhibit detours through irrelevant inferences.

Problem

Research questions and friction points this paper is trying to address.

Evaluating language models' reasoning efficiency using logic programming

Measuring how models avoid unnecessary inference steps

Assessing performance decline with irrelevant distracting information

Innovation

Methods, ideas, or system contributions that make the work stand out.

Aligning natural language proofs with shortest logic proofs

Quantifying reasoning efficiency by avoiding unnecessary inferences

Evaluating models with irrelevant axioms in math problems

🔎 Similar Papers

No similar papers found.