Wiring the 'Why': A Unified Taxonomy and Survey of Abductive Reasoning in LLMs

📅 2026-04-09
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Research on abductive reasoning in large language models has long suffered from the absence of a unified framework, leading to conceptual ambiguity and fragmented task definitions. This work proposes the first two-stage formalization—hypothesis generation and hypothesis selection—and establishes a systematic taxonomy encompassing tasks, datasets, methods, and evaluation protocols. Through comprehensive literature review, benchmarking, and cross-model comparative analysis, the study empirically reveals performance disparities among existing models on abductive reasoning tasks. It further identifies critical limitations, including reliance on static evaluation setups and insufficient domain coverage, thereby laying a foundational theoretical and practical groundwork for future research in this area.
📝 Abstract
Regardless of its foundational role in human discovery and sense-making, abductive reasoning--the inference of the most plausible explanation for an observation--has been relatively underexplored in Large Language Models (LLMs). Despite the rapid advancement of LLMs, the exploration of abductive reasoning and its diverse facets has thus far been disjointed rather than cohesive. This paper presents the first survey of abductive reasoning in LLMs, tracing its trajectory from philosophical foundations to contemporary AI implementations. To address the widespread conceptual confusion and disjointed task definitions prevalent in the field, we establish a unified two-stage definition that formally categorizes prior work. This definition disentangles abduction into \textit{Hypothesis Generation}, where models bridge epistemic gaps to produce candidate explanations, and \textit{Hypothesis Selection}, where the generated candidates are evaluated and the most plausible explanation is chosen. Building upon this foundation, we present a comprehensive taxonomy of the literature, categorizing prior work based on their abductive tasks, datasets, underlying methodologies, and evaluation strategies. In order to ground our framework empirically, we conduct a compact benchmark study of current LLMs on abductive tasks, together with targeted comparative analyses across model sizes, model families, evaluation styles, and the distinct generation-versus-selection task typologies. Moreover, by synthesizing recent empirical results, we examine how LLM performance on abductive reasoning relates to deductive and inductive tasks, providing insights into their broader reasoning capabilities. Our analysis reveals critical gaps in current approaches--from static benchmark design and narrow domain coverage to narrow training frameworks and limited mechanistic understanding of abductive processes...
Problem

Research questions and friction points this paper is trying to address.

abductive reasoning
Large Language Models
taxonomy
benchmarking
reasoning capabilities
Innovation

Methods, ideas, or system contributions that make the work stand out.

abductive reasoning
Large Language Models
hypothesis generation
hypothesis selection
reasoning taxonomy
🔎 Similar Papers
No similar papers found.
M
Moein Salimi
Department of Computer Engineering, Sharif University of Technology
S
Shaygan Adim
Department of Mathematical Sciences, Sharif University of Technology
D
Danial Parnian
Department of Computer Engineering, Sharif University of Technology
N
Nima Alighardashi
Department of Mathematical Sciences, Sharif University of Technology
M
Mahdi Jafari Siavoshani
Department of Computer Engineering, Sharif University of Technology
Mohammad Hossein Rohban
Mohammad Hossein Rohban
Associate Professor in Computer Engineering, Sharif University of Technology
Machine LearningStatisticsComputational Biology