🤖 AI Summary
Ontology matching faces significant challenges including pronounced semantic heterogeneity, scarcity of labeled data, and difficulty in aligning complex structural representations. To address these, this paper proposes the first general-purpose ontology matching framework based on large language model (LLM) agents. The method introduces a dual-Siamese agent architecture: one agent performs cross-ontology entity retrieval, while the other conducts fine-grained semantic alignment; it further develops OM-Toolset—a domain-specific toolchain—to augment LLMs’ ontological reasoning capabilities. By integrating LLM-based semantic understanding, agent-driven autonomous planning, and Siamese-style structured representation learning, the framework achieves competitive performance on the OAEI benchmark: near state-of-the-art results on simple tasks, and substantial improvements over existing SOTA methods in challenging settings involving complex structures and few-shot scenarios.
📝 Abstract
Ontology matching (OM) enables semantic interoperability between different ontologies and resolves their conceptual heterogeneity by aligning related entities. OM systems currently have two prevailing design paradigms: conventional knowledge-based expert systems and newer machine learning-based predictive systems. While large language models (LLMs) and LLM agents have revolutionised data engineering and have been applied creatively in many domains, their potential for OM remains underexplored. This study introduces a novel agent-powered LLM-based design paradigm for OM systems. With consideration of several specific challenges in leveraging LLM agents for OM, we propose a generic framework, namely Agent-OM (Agent for Ontology Matching), consisting of two Siamese agents for retrieval and matching, with a set of OM tools. Our framework is implemented in a proof-of-concept system. Evaluations of three Ontology Alignment Evaluation Initiative (OAEI) tracks over state-of-the-art OM systems show that our system can achieve results very close to the long-standing best performance on simple OM tasks and can significantly improve the performance on complex and few-shot OM tasks.