🤖 AI Summary
Existing evaluations of large language models (LLMs) suffer from fragmentation, weak theoretical grounding, and poor interpretability, particularly regarding their capabilities across fundamental logical reasoning paradigms—deduction, induction, abduction, and analogy.
Method: We introduce the first unified, multi-paradigm evaluation framework, empirically benchmarking LLMs on formal reasoning datasets including ProofWriter and FOLIO. We further propose a neuro-symbolic hybrid architecture, RLHF- and RFT-driven reasoning optimization, and verification-guided sampling to jointly enhance logical fidelity.
Contribution/Results: Our analysis reveals systematic generalization failures and deficits in logical rigor across all four paradigms. The framework provides a methodologically rigorous, reproducible foundation for trustworthy AI reasoning, identifies concrete bottlenecks (e.g., inconsistent premise handling in abduction, fragile analogical mapping), and delineates concrete pathways for future advancement—including formal logic integration, structured self-refinement, and paradigm-aware prompting.
📝 Abstract
With the emergence of advanced reasoning models like OpenAI o3 and DeepSeek-R1, large language models (LLMs) have demonstrated remarkable reasoning capabilities. However, their ability to perform rigorous logical reasoning remains an open question. This survey synthesizes recent advancements in logical reasoning within LLMs, a critical area of AI research. It outlines the scope of logical reasoning in LLMs, its theoretical foundations, and the benchmarks used to evaluate reasoning proficiency. We analyze existing capabilities across different reasoning paradigms - deductive, inductive, abductive, and analogical - and assess strategies to enhance reasoning performance, including data-centric tuning, reinforcement learning, decoding strategies, and neuro-symbolic approaches. The review concludes with future directions, emphasizing the need for further exploration to strengthen logical reasoning in AI systems.