🤖 AI Summary
Existing logical reasoning research often treats logical complexity and semantic complexity in isolation, limiting its ability to address core human reasoning challenges—including abstract propositions, contextual ambiguity, and conflicting stances. To bridge this gap, we propose LogicAgent: a multi-perspective reasoning framework grounded in the semiotic tetrad, integrating symbol-matrix-guided first-order logic deduction, existential import verification, and a three-valued (True/False/Unknown) judgment mechanism to jointly model logical and semantic dimensions. We further introduce RepublicQA—a philosophy-inspired, high-difficulty benchmark designed to evaluate abstract reasoning and deep logical inference, addressing a critical gap in current evaluation paradigms. Experiments show that LogicAgent achieves a 6.25% absolute gain over strong baselines on RepublicQA and delivers an average 7.05% improvement across ProntoQA, ProofWriter, FOLIO, and ProverQA, significantly enhancing the robustness of large language models in semantically complex logical reasoning tasks.
📝 Abstract
Logical reasoning is a fundamental capability of large language models (LLMs). However, existing studies largely overlook the interplay between logical complexity and semantic complexity, resulting in methods that struggle to address challenging scenarios involving abstract propositions, ambiguous contexts, and conflicting stances, which are central to human reasoning. For this gap, we propose LogicAgent, a semiotic-square-guided framework designed to jointly address logical complexity and semantic complexity. LogicAgent explicitly performs multi-perspective deduction in first-order logic (FOL), while mitigating vacuous reasoning through existential import checks that incorporate a three-valued decision scheme (True, False, Uncertain) to handle boundary cases more faithfully. Furthermore, to overcome the semantic simplicity and low logical complexity of existing datasets, we introduce RepublicQA, a benchmark that reaches college-level difficulty (FKGL = 11.94) and exhibits substantially greater lexical and structural diversity than prior benchmarks. RepublicQA is grounded in philosophical concepts, featuring abstract propositions and systematically organized contrary and contradictory relations, making it the most semantically rich resource for evaluating logical reasoning. Experiments demonstrate that LogicAgent achieves state-of-the-art performance on RepublicQA, with a 6.25% average gain over strong baselines, and generalizes effectively to mainstream logical reasoning benchmarks including ProntoQA, ProofWriter, FOLIO, and ProverQA, achieving an additional 7.05% average gain. These results highlight the strong effectiveness of our semiotic-grounded multi-perspective reasoning in boosting LLMs' logical performance.