Reasoning over Uncertain Text by Generative Large Language Models

📅 2024-02-14

📈 Citations: 3

✨ Influential: 0

career value

174K/year

🤖 AI Summary

Large language models (LLMs) exhibit limited capability in explicit probabilistic reasoning—particularly Bayesian inference. To address this, we introduce BLInD, the first fine-grained benchmark dedicated to evaluating Bayesian reasoning in LLMs, and propose a multi-paradigm prompting framework that reformulates probabilistic inference as Python code generation, probabilistic algorithm invocation, and ProbLog-based logical inference. This unified approach jointly models uncertainty, causal structure, and logical constraints. Our method improves average accuracy across multiple state-of-the-art LLMs on BLInD by +28.6% and demonstrates strong cross-domain generalization on causal question-answering tasks. Key contributions are: (1) the first fine-grained Bayesian inference evaluation benchmark; (2) an interpretable, verifiable multi-paradigm probabilistic prompting framework; and (3) a systematic characterization of fundamental limitations of LLMs in probabilistic modeling, including failures in conditional independence reasoning, causal graph construction, and constraint propagation.

Technology Category

Application Category

📝 Abstract

This paper considers the challenges Large Language Models (LLMs) face when reasoning over text that includes information involving uncertainty explicitly quantified via probability values. This type of reasoning is relevant to a variety of contexts ranging from everyday conversations to medical decision-making. Despite improvements in the mathematical reasoning capabilities of LLMs, they still exhibit significant difficulties when it comes to probabilistic reasoning. To deal with this problem, we introduce the Bayesian Linguistic Inference Dataset (BLInD), a new dataset specifically designed to test the probabilistic reasoning capabilities of LLMs. We use BLInD to find out the limitations of LLMs for tasks involving probabilistic reasoning. In addition, we present several prompting strategies that map the problem to different formal representations, including Python code, probabilistic algorithms, and probabilistic logical programming. We conclude by providing an evaluation of our methods on BLInD and an adaptation of a causal reasoning question-answering dataset. Our empirical results highlight the effectiveness of our proposed strategies for multiple LLMs.

Problem

Research questions and friction points this paper is trying to address.

Large Language Models

Probabilistic Reasoning

Uncertainty Handling

Innovation

Methods, ideas, or system contributions that make the work stand out.

Bayesian Language Inference Dataset

Probabilistic Reasoning

Enhancement Strategies

🔎 Similar Papers

Understanding the Relationship between Prompts and Response Uncertainty in Large Language Models