ABD: Default Exception Abduction in Finite First Order Worlds

📅 2026-02-21
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work proposes a formalization of the default-and-exception abductive reasoning task: given a finite first-order logic theory containing abnormality predicates, the goal is to infer concise definitions of abnormalities that restore satisfiability. To this end, the authors construct ABD, the first benchmark for this task, incorporating three observation mechanisms—closed-world, existential completion, and universal completion—and integrate SMT-based validation to ensure both validity and sparsity of the generated formulas. A systematic evaluation of ten state-of-the-art large language models on 600 instances reveals that while the best-performing model achieves high validity, it still falls short in producing sufficiently concise abductions and exhibits significant generalization failures across different observation mechanisms.

Technology Category

Application Category

📝 Abstract
We introduce ABD, a benchmark for default-exception abduction over finite first-order worlds. Given a background theory with an abnormality predicate and a set of relational structures, a model must output a first-order formula that defines exceptions, restoring satisfiability while keeping exceptions sparse. We formalize three observation regimes (closed-world, existential completion, universal completion) with exact SMT verification. Evaluating ten frontier LLMs on 600 instances, the best models achieve high validity but parsimony gaps remain, and holdout evaluation reveals distinct generalization failure modes across regimes.
Problem

Research questions and friction points this paper is trying to address.

default-exception abduction
finite first-order worlds
abnormality predicate
satisfiability
sparse exceptions
Innovation

Methods, ideas, or system contributions that make the work stand out.

default-exception abduction
finite first-order worlds
abnormality predicate
SMT verification
parsimony
🔎 Similar Papers
No similar papers found.