π€ AI Summary
This work addresses the challenge of efficiently supporting question answering and visualization over massive knowledge graphs, which are often too large for direct use. To this end, it introduces a novel personalized summarization approach grounded in coreset theory, tailored to usersβ query behaviors. The proposed method employs a sensitivity-based importance sampling strategy to construct a compact, high-fidelity graph summary that provably bounds approximation error with respect to a given query workload. By modeling user queries, computing sensitivity scores, and building coresets accordingly, the approach substantially outperforms existing methods such as GLIMPSE and PPR on benchmark knowledge graphs including Freebase, Wikidata, and DBpedia. Remarkably, it achieves higher question-answering accuracy and structural coverage while retaining only a tiny fraction of the original graph.
π Abstract
Knowledge Graphs (KGs) are extensively used across different domains and in several applications. Often, these KGs are very large in size. Such KGs become unwieldy for tasks such as question answering and visualization. Summarization of KGs offers a viable alternative in such cases. Furthermore, personalized KG summarization is crucial in the current data-driven world as it captures the specific requirements of users based on their query patterns. Since it only maintains relevant information, the personalized summaries of KG are small, resulting in significantly smaller storage requirements and query runtime. In this work, we adapt the coreset theory to create personalized KG summaries. For a given dataset and a user-specific query workload, we present an approach that samples a relevant subset of triples using sensitivity-based importance sampling. We ensure that the subset approximates the characteristics of the full dataset with bounded approximation error. We define sensitivity scores that measure the importance of a triple with respect to a user's query workload, which are then used by our coreset construction algorithm. We explicitly focus on personalized knowledge graph summarization by constructing summaries independently for each user based on their query behaviour.
Our evaluation on Freebase, WikiData, and DBpedia shows that COREKG delivers higher query-answering accuracy and structural coverage than the state-of-the-art methods, such as GLIMPSE, PPR, iSummary, PEGASUS and APEX$^2$ while requiring only a tiny fraction of the original graph.