🤖 AI Summary
To address the challenges of inaccurate semantic capture and weak logical structure modeling in large language models (LLMs) for natural language-to-first-order logic (NL→FOL) translation, this paper proposes a “Parse–Translate–Verify” divide-and-conquer paradigm. Methodologically, it introduces (1) a novel Logical Dependency Structure (LDS) representation to explicitly encode intra-sentential logical dependencies; (2) a multi-path sequential translation framework enabling fine-grained, segmented generation; and (3) a dual verification mechanism comprising SAT-solver-based semantic equivalence checking and probabilistic formula selection. By synergistically integrating LLMs’ semantic understanding, custom graph-structured representations, and formal verification, the approach achieves significant improvements over existing neural-symbolic methods across seven logic reasoning benchmarks, establishing new state-of-the-art performance.
📝 Abstract
Complex logical reasoning tasks require a long sequence of reasoning, which a large language model (LLM) with chain-of-thought prompting still falls short. To alleviate this issue, neurosymbolic approaches incorporate a symbolic solver. Specifically, an LLM only translates a natural language problem into a satisfiability (SAT) problem that consists of first-order logic formulas, and a sound symbolic solver returns a mathematically correct solution. However, we discover that LLMs have difficulties to capture complex logical semantics hidden in the natural language during translation. To resolve this limitation, we propose a Compositional First-Order Logic Translation. An LLM first parses a natural language sentence into newly defined logical dependency structures that consist of an atomic subsentence and its dependents, then sequentially translate the parsed subsentences. Since multiple logical dependency structures and sequential translations are possible for a single sentence, we also introduce two Verification algorithms to ensure more reliable results. We utilize an SAT solver to rigorously compare semantics of generated first-order logic formulas and select the most probable one. We evaluate the proposed method, dubbed CLOVER, on seven logical reasoning benchmarks and show that it outperforms the previous neurosymbolic approaches and achieves new state-of-the-art results.