FCoReBench: Can Large Language Models Solve Challenging First-Order Combinatorial Reasoning Problems?

📅 2024-02-04

📈 Citations: 5

✨ Influential: 0

career value

194K/year

🤖 AI Summary

Large language models (LLMs) exhibit severe scale sensitivity and structural neglect on first-order combinatorial reasoning tasks—such as graph coloring, knapsack, and cryptarithmetic—which are NP-hard; their accuracy degrades sharply with increasing instance size. To address this, we introduce FCoReBench, the first scalable benchmark for first-order combinatorial reasoning. We further propose SymPro-LM, a novel paradigm that tightly couples a symbolic solver (Z3), a Python program interpreter, and an example-guided feedback mechanism—enabling purely symbolic, zero-shot LLM-free inference. SymPro-LM eliminates reliance on prompt engineering and achieves cross-scale robustness. Experiments demonstrate that SymPro-LM significantly improves accuracy on FCoReBench and generalizes effectively to other logical reasoning benchmarks. This work is the first to systematically identify and mitigate LLMs’ fundamental limitations in structured combinatorial reasoning.

Technology Category

Application Category

📝 Abstract

Can the large language models (LLMs) solve challenging first-order combinatorial reasoning problems such as graph coloring, knapsack, and cryptarithmetic? By first-order, we mean these problems can be instantiated with potentially an infinite number of problem instances of varying sizes. They are also challenging being NP-hard and requiring several reasoning steps to reach a solution. While existing work has focused on coming up with datasets with hard benchmarks, there is limited work which exploits the first-order nature of the problem structure. To address this challenge, we present FCoReBench, a dataset of 40 such challenging problems, along with scripts to generate problem instances of varying sizes and automatically verify and generate their solutions. We first observe that LLMs, even when aided by symbolic solvers, perform rather poorly on our dataset, being unable to leverage the underlying structure of these problems. We specifically observe a drop in performance with increasing problem size. In response, we propose a new approach, SymPro-LM, which combines LLMs with both symbolic solvers and program interpreters, along with feedback from a few solved examples, to achieve huge performance gains. Our proposed approach is robust to changes in the problem size, and has the unique characteristic of not requiring any LLM call during inference time, unlike earlier approaches. As an additional experiment, we also demonstrate SymPro-LM's effectiveness on other logical reasoning benchmarks.

Problem

Research questions and friction points this paper is trying to address.

Evaluate LLMs on first-order combinatorial reasoning problems.

Propose SymPro-LM to enhance LLM performance on NP-hard problems.

Introduce FCoReBench for scalable problem instance generation and verification.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Combines LLMs with symbolic solvers and interpreters

Introduces SymPro-LM for robust problem size handling

Eliminates LLM calls during inference for efficiency

🔎 Similar Papers

No similar papers found.