Entailment-Preserving First-order Logic Representations in Natural Language Entailment

📅 2025-02-24

📈 Citations: 0

✨ Influential: 0

career value

144K/year

🤖 AI Summary

This work addresses the challenge of preserving original entailment relations when representing natural language inference (NLI) instances in first-order logic (FOL), where conventional FOL encodings often suffer from semantic drift. We propose a novel task—Entailment-Preserving FOL Modeling (EPF)—and introduce a reference-free evaluation metric family, Entailment-Preserving Rate (EPR). Methodologically, we design an iterative learning-to-rank framework that directly optimizes entailment preservation, integrating predicate signature constraints, neural NL→FOL translation, and automated theorem proving to suppress semantic divergence. On three benchmarks—including EntailmentBank—EPR improves by 1.8–2.7 percentage points, while EPR@16 rises by 17.4–20.6 points. The approach demonstrates strong generalization to multi-step reasoning and cross-domain data. To our knowledge, this is the first end-to-end, verifiable, and inference-engine-friendly FOL semantic modeling framework that rigorously preserves entailment.

Technology Category

Application Category

📝 Abstract

First-order logic (FOL) can represent the logical entailment semantics of natural language (NL) sentences, but determining natural language entailment using FOL remains a challenge. To address this, we propose the Entailment-Preserving FOL representations (EPF) task and introduce reference-free evaluation metrics for EPF, the Entailment-Preserving Rate (EPR) family. In EPF, one should generate FOL representations from multi-premise natural language entailment data (e.g. EntailmentBank) so that the automatic prover's result preserves the entailment labels. Experiments show that existing methods for NL-to-FOL translation struggle in EPF. To this extent, we propose a training method specialized for the task, iterative learning-to-rank, which directly optimizes the model's EPR score through a novel scoring function and a learning-to-rank objective. Our method achieves a 1.8-2.7% improvement in EPR and a 17.4-20.6% increase in EPR@16 compared to diverse baselines in three datasets. Further analyses reveal that iterative learning-to-rank effectively suppresses the arbitrariness of FOL representation by reducing the diversity of predicate signatures, and maintains strong performance across diverse inference types and out-of-domain data.

Problem

Research questions and friction points this paper is trying to address.

First-order logic for natural language entailment

Entailment-Preserving FOL representations challenge

Iterative learning-to-rank method improvement

Innovation

Methods, ideas, or system contributions that make the work stand out.

Entailment-Preserving FOL representations

Iterative learning-to-rank training

Reference-free EPR evaluation metrics

🔎 Similar Papers

No similar papers found.