Comprehensive Evaluation for a Large Scale Knowledge Graph Question Answering Service

📅 2025-01-28

📈 Citations: 0

✨ Influential: 0

career value

165K/year

🤖 AI Summary

Prior to deployment, large-scale knowledge graph question answering (KGQA) systems lack comprehensive, interpretable, and scalable evaluation methodologies. Method: This paper proposes Chronos, a novel framework introducing the first hierarchical and scalable evaluation paradigm tailored for multi-component KGQA systems. Chronos enables joint end-to-end and component-level assessment by integrating metric decoupling, dynamic data adaptation, and offline simulation techniques—unifying evaluation across natural language understanding, semantic parsing, and graph query execution. Contribution/Results: Evaluated on real-world industrial KGQA services, Chronos significantly improves fault localization efficiency and model iteration quality. It has been adopted as a core quality gatekeeping tool, providing data-driven, pre-deployment performance estimation and decision support for KGQA system release.

Technology Category

Application Category

📝 Abstract

Question answering systems for knowledge graph (KGQA), answer factoid questions based on the data in the knowledge graph. KGQA systems are complex because the system has to understand the relations and entities in the knowledge-seeking natural language queries and map them to structured queries against the KG to answer them. In this paper, we introduce Chronos, a comprehensive evaluation framework for KGQA at industry scale. It is designed to evaluate such a multi-component system comprehensively, focusing on (1) end-to-end and component-level metrics, (2) scalable to diverse datasets and (3) a scalable approach to measure the performance of the system prior to release. In this paper, we discuss the unique challenges associated with evaluating KGQA systems at industry scale, review the design of Chronos, and how it addresses these challenges. We will demonstrate how it provides a base for data-driven decisions and discuss the challenges of using it to measure and improve a real-world KGQA system.

Problem

Research questions and friction points this paper is trying to address.

Knowledge Graph Question Answering

System Evaluation

Performance Assurance

Innovation

Methods, ideas, or system contributions that make the work stand out.

Chronos

Enterprise-scale KGQA

Comprehensive Evaluation Framework

🔎 Similar Papers

Benchmarking Large Language Models in Complex Question Answering Attribution using Knowledge Graphs