🤖 AI Summary
To address the challenges of identifying implicit semantic relationships and inconsistent labeling in large-scale XML data generated by laboratory robots—hindering cross-laboratory interoperability—this paper proposes the first LLM-driven framework for XML schema relationship extraction and label refinement. The method integrates large language models’ capabilities for relation extraction and label generation with structured schema analysis and human-in-the-loop validation, establishing a closed-loop workflow for relation identification, semantic annotation, and label optimization. A quantitative evaluation module is further designed to assess label quality. Compared to conventional rule-based or supervised learning approaches, our framework significantly improves labeling accuracy and scalability, enabling semi-automated ontology construction and knowledge graph generation. It provides a reusable methodology and practical paradigm for achieving semantic interoperability in experimental automation.
📝 Abstract
A large volume of XML data is produced in experiments carried out by robots in laboratories. In order to support the interoperability of data between labs, there is a motivation to translate the XML data into a knowledge graph. A key stage of this process is the enrichment of the XML schema to lay the foundation of an ontology schema. To achieve this, we present the RELRaE framework, a framework that employs large language models in different stages to extract and accurately label the relationships implicitly present in the XML schema. We investigate the capability of LLMs to accurately generate these labels and then evaluate them. Our work demonstrates that LLMs can be effectively used to support the generation of relationship labels in the context of lab automation, and that they can play a valuable role within semi-automatic ontology generation frameworks more generally.