Advancing Natural Language Formalization to First Order Logic with Fine-tuned LLMs

📅 2025-09-26
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Translating natural language to first-order logic (FOL) remains a fundamental challenge in knowledge representation and formal reasoning. This work systematically evaluates large language models (LLMs) on this task, introducing two novel strategies: predicate conditioning—leveraging explicit predicate inventories—and multilingual joint training. We further propose a dual-axis evaluation framework grounded in logical equivalence and predicate alignment. Experiments demonstrate that Flan-T5-XXL substantially outperforms both GPT-4o and the symbolic system ccg2lambda, achieving 70% accuracy when provided with predicate lists—a 15–20% gain over baselines. Crucially, encoder-decoder architectures (e.g., T5 variants) exhibit superior logical generalization compared to decoder-only models, showing robust cross-dataset performance on MALLS, Willow, and FOLIO. These findings establish a scalable, rigorously evaluable, LLM-driven paradigm for FOL formalization.

Technology Category

Application Category

📝 Abstract
Automating the translation of natural language to first-order logic (FOL) is crucial for knowledge representation and formal methods, yet remains challenging. We present a systematic evaluation of fine-tuned LLMs for this task, comparing architectures (encoder-decoder vs. decoder-only) and training strategies. Using the MALLS and Willow datasets, we explore techniques like vocabulary extension, predicate conditioning, and multilingual training, introducing metrics for exact match, logical equivalence, and predicate alignment. Our fine-tuned Flan-T5-XXL achieves 70% accuracy with predicate lists, outperforming GPT-4o and even the DeepSeek-R1-0528 model with CoT reasoning ability as well as symbolic systems like ccg2lambda. Key findings show: (1) predicate availability boosts performance by 15-20%, (2) T5 models surpass larger decoder-only LLMs, and (3) models generalize to unseen logical arguments (FOLIO dataset) without specific training. While structural logic translation proves robust, predicate extraction emerges as the main bottleneck.
Problem

Research questions and friction points this paper is trying to address.

Automating natural language to first-order logic translation
Evaluating fine-tuned LLMs for logical formalization tasks
Addressing predicate extraction as the main bottleneck
Innovation

Methods, ideas, or system contributions that make the work stand out.

Fine-tuned Flan-T5-XXL for logic translation
Predicate conditioning boosts performance significantly
Generalization to unseen arguments without training
🔎 Similar Papers
No similar papers found.
F
Felix Vossel
Osnabrück University, Germany
Till Mossakowski
Till Mossakowski
Professor of Computer Science, University of Osnabrück
Logicformal ontologyknowledge representationneuro-symbolic AI
B
Björn Gehrke
Osnabrück University, Germany