Large Language Model for OWL Proofs

📅 2026-01-18

📈 Citations: 0

✨ Influential: 0

career value

202K/year

🤖 AI Summary

This study addresses the limitations of large language models (LLMs) in generating faithful and readable logical proofs, particularly within the context of OWL ontology reasoning, where systematic evaluation has been lacking. The work proposes a novel three-stage automated evaluation framework—comprising extraction, simplification, and explanation—augmented with premise logical completeness analysis to construct a dedicated dataset for assessing the reasoning and explanatory capabilities of mainstream LLMs. Experimental results indicate that while models perform reasonably well overall, their performance degrades significantly on cases involving high logical complexity. Notably, logical complexity exerts a far greater impact on model performance than syntactic representation format. Furthermore, input noise and incomplete premises substantially undermine the quality of generated proofs.

Technology Category

Application Category

📝 Abstract

The ability of Large Language Models (LLMs) to perform reasoning tasks such as deduction has been widely investigated in recent years. Yet, their capacity to generate proofs-faithful, human-readable explanations of why conclusions follow-remains largely under explored. In this work, we study proof generation in the context of OWL ontologies, which are widely adopted for representing and reasoning over complex knowledge, by developing an automated dataset construction and evaluation framework. Our evaluation encompassing three sequential tasks for complete proving: Extraction, Simplification, and Explanation, as well as an additional task of assessing Logic Completeness of the premise. Through extensive experiments on widely used reasoning LLMs, we achieve important findings including: (1) Some models achieve overall strong results but remain limited on complex cases; (2) Logical complexity, rather than representation format (formal logic language versus natural language), is the dominant factor shaping LLM performance; and (3) Noise and incompleteness in input data substantially diminish LLMs'performance. Together, these results underscore both the promise of LLMs for explanation with rigorous logics and the gap of supporting resilient reasoning under complex or imperfect conditions. Code and data are available at https://github.com/HuiYang1997/LLMOwlR.

Problem

Research questions and friction points this paper is trying to address.

Large Language Models

Proof Generation

OWL Ontologies

Logical Reasoning

Explanation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Large Language Models

OWL Ontologies

Proof Generation