Can LLMs Solve ASP Problems? Insights from a Benchmarking Study (Extended Version)

📅 2025-07-25
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing evaluations of large language models (LLMs) on Answer Set Programming (ASP) suffer from significant limitations: overly simplistic test programs, inadequate support for negation, disjunction, and multiple answer sets, and the absence of a dedicated benchmark. Method: We introduce ASPBench, the first comprehensive, task-oriented benchmark for ASP, comprising three core tasks—ASP entailment, answer set verification, and answer set computation—to systematically assess 14 state-of-the-art LLMs on nonmonotonic reasoning and complex logical structures. Contribution/Results: Empirical results reveal that current LLMs perform poorly on fundamental ASP solving tasks—especially answer set computation—achieving only marginal accuracy even on basic instances, thereby exposing critical deficits in deep symbolic reasoning. ASPBench provides a rigorous evaluation framework and empirical foundation for advancing neuro-symbolic integration.

Technology Category

Application Category

📝 Abstract
Answer Set Programming (ASP) is a powerful paradigm for non-monotonic reasoning. Recently, large language models (LLMs) have demonstrated promising capabilities in logical reasoning. Despite this potential, current evaluations of LLM capabilities in ASP are often limited. Existing works normally employ overly simplified ASP programs, do not support negation, disjunction, or multiple answer sets. Furthermore, there is a lack of benchmarks that introduce tasks specifically designed for ASP solving. To bridge this gap, we introduce ASPBench, a comprehensive ASP benchmark, including three ASP specific tasks: ASP entailment, answer set verification, and answer set computation. Our extensive evaluations on ASPBench reveal that while 14 state-of-the-art LLMs, including emph{deepseek-r1}, emph{o4-mini}, and emph{gemini-2.5-flash-thinking}, perform relatively well on the first two simpler tasks, they struggle with answer set computation, which is the core of ASP solving. These findings offer insights into the current limitations of LLMs in ASP solving. This highlights the need for new approaches that integrate symbolic reasoning capabilities more effectively. The code and dataset are available at https://github.com/HomuraT/ASPBench.
Problem

Research questions and friction points this paper is trying to address.

Evaluating LLMs' limited capabilities in solving Answer Set Programming (ASP) problems.
Addressing the lack of comprehensive benchmarks for ASP-specific tasks.
Identifying LLMs' struggles with core ASP tasks like answer set computation.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Introduces ASPBench for comprehensive ASP benchmarking
Evaluates 14 LLMs on ASP specific tasks
Highlights need for better symbolic reasoning integration
🔎 Similar Papers
No similar papers found.
L
Lin Ren
School of Computer Science and Engineering, Southeast University, Nanjing, China
Guohui Xiao
Guohui Xiao
Professor of Computer Science, Southeast University, China
Artificial IntelligenceKnowledge RepresentationKnowledge GraphsLarge Language Models
Guilin Qi
Guilin Qi
Southeast University
Artificial Intelligenceontology
Y
Yishuai Geng
School of Computer Science and Engineering, Southeast University, Nanjing, China
H
Haohan Xue
School of Computer Science and Engineering, Southeast University, Nanjing, China