🤖 AI Summary
This work addresses the challenge of accurately translating natural language logical problems into first-order logic (FOL) expressions using large language models (LLMs), where existing approaches suffer from insufficient logical correctness and syntactic consistency. To bridge this gap, we introduce LogicPO—the first preference-optimization dataset specifically designed for logical formalization—and pioneer the application of preference learning techniques, including Direct Preference Optimization (DPO) and Knowledge Tuning Optimization (KTO), to this task. Leveraging open-source models such as Phi-3.5, we jointly apply supervised fine-tuning and preference optimization to enhance holistic logical structure modeling. Experimental results demonstrate that our method improves logical correctness by 10% over GPT-3.5-turbo (8-shot) and reduces syntax error rate by 14%, validating the effectiveness and novelty of preference learning in logical formalization.
📝 Abstract
Logical reasoning is a key task for artificial intelligence due to it's role in major downstream tasks such as Question Answering, Summarization. Recent methods in improving the reasoning ability of LLMs fall short in correctly converting a natural language reasoning problem to an equivalent logical formulation, which hinders the framework's overall ability to reason. Towards this, we propose to use finetuning on a preference optimization dataset to learn to parse and represent a natural language problem as a whole to a consistent logical program by 1) introducing a new supervised and preference optimization dataset LogicPO, and 2) adopting popular techniques such as Direct Preference Optimization (DPO), Kahneman-Tversky optimization (KTO) to finetune open-source LLMs. Our best model with Phi-3.5 consistently outperforms GPT-3.5-turbo's (8-shot) by producing 10% more logically correct and with 14% less syntax errors. Through the framework and our improved evaluation metrics, we offer a promising direction in improving the logical reasoning of LLMs by better representing them in their logical formulations.