Presupposition and Reasoning in Conditionals: A Theory-Based Study of Humans and LLMs

📅 2026-05-18

📈 Citations: 0

✨ Influential: 0

career value

171K/year

🤖 AI Summary

This study addresses the lack of systematic evaluation of large language models’ (LLMs) pragmatic reasoning capabilities in presupposition projection from conditional sentences. The authors construct a controlled dataset of conditionals and employ both behavioral experiments with human participants and a linguistically grounded automated evaluation framework—integrating, for the first time, a theory-driven checklist with LLM-as-a-Judge methodology—to enable parallel comparison between human and model performance. Results reveal that while humans integrate probabilistic expectations with pragmatic cues to make judgments, LLMs only partially align with human ratings and exhibit no consistent capacity for deep pragmatic inference. Instead, models predominantly rely on surface-level pattern matching, highlighting significant limitations in their pragmatic understanding of presuppositional phenomena.

📝 Abstract

Presupposition projection in conditionals is central to theories of meaning and pragmatics, yet it remains largely unevaluated in large language models. We address this gap through a parallel behavioral study comparing human judgments and LLM predictions on a normed dataset of conditional sentences that controls the relation between the antecedent and the projected presupposition. We collect likelihood ratings from 120 participants and four LLMs under matched contextual conditions. Results show that humans integrate probabilistic and pragmatic cues in their judgment, whereas LLMs show variable alignment with human patterns. Using a linguistically motivated checklist within an LLM-as-a-Judge framework, we further evaluate model reasoning. We observe models that best match human ratings often lack coherent pragmatic reasoning, while models with stronger reasoning produce less human-like judgments. These findings suggest that LLMs' performance on such tasks may result from surface pattern matching rather than pragmatic competence. Our findings highlight the importance of benchmarks grounded in linguistic theory for comparing humans and models.

Problem

Research questions and friction points this paper is trying to address.

presupposition projection

conditionals

large language models

pragmatic reasoning

human-model comparison

Innovation

Methods, ideas, or system contributions that make the work stand out.

presupposition projection

conditional sentences

pragmatic reasoning