🤖 AI Summary
In domain ontology construction, mapping multi-source terminology to foundational concepts faces three key challenges: high cost and subjectivity of manual approaches, shallow semantic modeling and poor cross-domain consistency of automated methods, and weak interpretability. To address these, this paper proposes the first LLM-driven framework integrating expert calibration with iterative prompt optimization. The framework combines expert-guided annotation, multi-stage prompt engineering, and a human-in-the-loop validation cycle to generate concept links with high confidence and full interpretability. Evaluated on the concept necessity mapping task, it achieves an F1-score of 0.97—substantially surpassing the human baseline (0.68)—and marks the first instance of scalable ontology alignment that simultaneously attains expert-level accuracy and transparent, auditable reasoning.
📝 Abstract
Having a unified, coherent taxonomy is essential for effective knowledge representation in domain-specific applications as diverse terminologies need to be mapped to underlying concepts. Traditional manual approaches to taxonomy alignment rely on expert review of concept pairs, but this becomes prohibitively expensive and time-consuming at scale, while subjective interpretations often lead to expert disagreements. Existing automated methods for taxonomy alignment have shown promise but face limitations in handling nuanced semantic relationships and maintaining consistency across different domains. These approaches often struggle with context-dependent concept mappings and lack transparent reasoning processes. We propose a novel framework that combines large language models (LLMs) with expert calibration and iterative prompt optimization to automate taxonomy alignment. Our method integrates expert-labeled examples, multi-stage prompt engineering, and human validation to guide LLMs in generating both taxonomy linkages and supporting rationales. In evaluating our framework on a domain-specific mapping task of concept essentiality, we achieved an F1-score of 0.97, substantially exceeding the human benchmark of 0.68. These results demonstrate the effectiveness of our approach in scaling taxonomy alignment while maintaining high-quality mappings and preserving expert oversight for ambiguous cases.