🤖 AI Summary
Large language models (LLMs) exhibit semantic hallucinations—such as spurious alignments, fabricated entities, and logical inconsistencies—when performing ontology matching, undermining reliability in knowledge-intensive tasks.
Method: We introduce OAEI-LLM, the first LLM hallucination benchmark tailored to middle-school-level knowledge classification, extending the international OAEI standard. We formally define and annotate LLM-specific hallucination types in ontology matching, propose a hallucination-aware data construction paradigm with schema extension mechanisms, and ensure annotation reliability via multi-stage validation: human verification, rule-based consistency checks, expert review, and cross-model agreement analysis.
Contribution/Results: We release a multi-domain ontology alignment benchmark with fine-grained hallucination annotations, enabling rigorous evaluation of hallucination detection methods, robust semantic alignment techniques, and trustworthy LLM-based ontology matching (LLM-OM) approaches.
📝 Abstract
Hallucinations of large language models (LLMs) commonly occur in domain-specific downstream tasks, with no exception in ontology matching (OM). The prevalence of using LLMs for OM raises the need for benchmarks to better understand LLM hallucinations. The OAEI-LLM dataset is an extended version of the Ontology Alignment Evaluation Initiative (OAEI) datasets that evaluate LLM-specific hallucinations in OM tasks. We outline the methodology used in dataset construction and schema extension, and provide examples of potential use cases.