An Empirical Study of Conformal Prediction in LLM with ASP Scaffolds for Robust Reasoning

📅 2025-03-07
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the unreliability of open-weight large language models (LLMs) in complex, multi-step spatial reasoning. We propose the first verifiable reasoning framework integrating conformal language modeling (CLM) with answer set programming (ASP). Methodologically, CLM calibrates LLM-generated structured intermediate steps, ASP enforces logical consistency verification, and an LLM-as-Judge mechanism enables dynamic confidence assessment and result filtering. Our core contribution is the first synergistic embedding of conformal prediction’s statistical guarantees and ASP’s formal reasoning capabilities into the LLM reasoning chain—yielding provable confidence bounds for each inference step. Experiments on the StepGame dataset demonstrate that our approach significantly outperforms standard sampling baselines, with accuracy robustly improving as reasoning depth increases. Moreover, the LLM-as-Judge component enhances precision in identifying logically correct outputs.

Technology Category

Application Category

📝 Abstract
In this paper, we examine the use of Conformal Language Modelling (CLM) alongside Answer Set Programming (ASP) to enhance the performance of standard open-weight LLMs on complex multi-step reasoning tasks. Using the StepGame dataset, which requires spatial reasoning, we apply CLM to generate sets of ASP programs from an LLM, providing statistical guarantees on the correctness of the outputs. Experimental results show that CLM significantly outperforms baseline models that use standard sampling methods, achieving substantial accuracy improvements across different levels of reasoning complexity. Additionally, the LLM-as-Judge metric enhances CLM's performance, especially in assessing structurally and logically correct ASP outputs. However, calibrating CLM with diverse calibration sets did not improve generalizability for tasks requiring much longer reasoning steps, indicating limitations in handling more complex tasks.
Problem

Research questions and friction points this paper is trying to address.

Enhance LLM performance on multi-step reasoning tasks
Apply CLM with ASP for robust spatial reasoning
Evaluate CLM's limitations in complex reasoning tasks
Innovation

Methods, ideas, or system contributions that make the work stand out.

Conformal Language Modelling enhances LLM reasoning.
Answer Set Programming ensures output correctness guarantees.
LLM-as-Judge metric improves ASP output assessment.
🔎 Similar Papers
No similar papers found.