🤖 AI Summary
In high-assurance domains such as law and medicine, defeasible rules (e.g., “P holds generally, but exception Q invalidates P”) pose significant reasoning challenges: a single exception can overturn a general conclusion, undermining LLMs’ accuracy and verifiability. To address this, we propose LOGicalThought, a neurosymbolic architecture that constructs a two-tier symbolic graph and logical context to map long-text reasoning into compact, verifiable logical evaluation—explicitly modeling negation, entailment, and defeasible inference. By integrating a high-level logical language, a symbolic reasoning engine, and a large language model, LOGicalThought enables structured inference over rule-exception structures. Evaluated on four cross-domain benchmarks, it achieves an overall accuracy gain of 11.84%, with improvements of 10.2%, 13.2%, and 5.5% on negation, entailment, and defeasible reasoning subtasks, respectively—substantially enhancing rigorous, trustworthy reasoning in high-assurance settings.
📝 Abstract
High-assurance reasoning, particularly in critical domains such as law and medicine, requires conclusions that are accurate, verifiable, and explicitly grounded in evidence. This reasoning relies on premises codified from rules, statutes, and contracts, inherently involving defeasible or non-monotonic logic due to numerous exceptions, where the introduction of a single fact can invalidate general rules, posing significant challenges. While large language models (LLMs) excel at processing natural language, their capabilities in standard inference tasks do not translate to the rigorous reasoning required over high-assurance text guidelines. Core reasoning challenges within such texts often manifest specific logical structures involving negation, implication, and, most critically, defeasible rules and exceptions. In this paper, we propose a novel neurosymbolically-grounded architecture called LOGicalThought (LogT) that uses an advanced logical language and reasoner in conjunction with an LLM to construct a dual symbolic graph context and logic-based context. These two context representations transform the problem from inference over long-form guidelines into a compact grounded evaluation. Evaluated on four multi-domain benchmarks against four baselines, LogT improves overall performance by 11.84% across all LLMs. Performance improves significantly across all three modes of reasoning: by up to +10.2% on negation, +13.2% on implication, and +5.5% on defeasible reasoning compared to the strongest baseline.