Evaluation of Pipelines for Data Integration into Knowledge Graphs

📅 2026-05-21
📈 Citations: 0
Influential: 0
📄 PDF

career value

178K/year
🤖 AI Summary
This work addresses the lack of a unified evaluation framework for knowledge graph integration pipelines, which hinders systematic comparison and selection of methods. To bridge this gap, the paper introduces KGI-Bench, the first comprehensive benchmark specifically designed for evaluating knowledge graph data integration. KGI-Bench assesses integration performance across three key dimensions—coverage, correctness, and consistency—when incorporating heterogeneous input data (structured, semi-structured, and unstructured) into a target knowledge graph. Using a curated dataset in the movie domain, the benchmark evaluates twelve representative integration pipelines, revealing significant performance variations attributable to input data types and architectural choices. The results demonstrate the effectiveness and practical utility of KGI-Bench in enabling rigorous, reproducible evaluation of knowledge graph integration approaches.
📝 Abstract
Integrating new data into knowledge graphs (KG) typically involves different tasks that are executed within workflows or pipelines There are many possible pipelines for a specific integration problem but there is not yet a general approach to evaluate the overall quality and performance of such pipelines to be able to determine the best choices. We therefore propose a new benchmark KGI-Bench to evaluate integration pipelines that ingest different kinds of input data into an existing KG. We evaluate pipelines by analyzing their output, i.e., the updated KG, with the three complementary quality metrics coverage, correctness and consistency. We also provide benchmark datasets (seed KG, overlapping input data of three formats, reference KG as a ground truth) for the movie domain. To demonstrate the applicability and usefulness of the proposed benchmark, we comparatively evaluate 12 pipelines and analyze their behavior across different input data formats and design choices.
Problem

Research questions and friction points this paper is trying to address.

knowledge graph
data integration
pipeline evaluation
quality metrics
benchmark
Innovation

Methods, ideas, or system contributions that make the work stand out.

knowledge graph integration
pipeline evaluation
KGI-Bench
quality metrics
benchmark dataset