HOME-KGQA: A Benchmark Dataset for Multimodal Knowledge Graph Question Answering on Household Daily Activities

📅 2026-05-10

📈 Citations: 0

✨ Influential: 0

career value

178K/year

🤖 AI Summary

This work addresses the limitations of existing knowledge graph question answering (KGQA) benchmarks, which are predominantly confined to encyclopedic, single-modality settings and lack fine-grained spatiotemporal information, thereby failing to support real-world scenarios such as embodied intelligence. To bridge this gap, we introduce the first multimodal KGQA benchmark tailored to everyday household activities, featuring complex multi-hop natural language questions paired with their corresponding graph queries. The dataset emphasizes hierarchical spatiotemporal reasoning, multimodal alignment, and aggregation operations, enabled through multimodal knowledge graph construction, NL-to-GraphQ mapping, spatiotemporal semantic modeling, and support for aggregation functions. Experimental results demonstrate a significant performance drop among current large language models on this benchmark, revealing critical challenges for KGQA systems in realistic contexts and filling a crucial evaluation gap for embodied AI.

📝 Abstract

Large Language Models (LLMs) provide flexible natural language processing capabilities, while knowledge graphs (KGs) offer explicit and structured knowledge. Integrating these two in a complementary manner enables the development of reliable and verifiable AI systems. In particular, knowledge graph question answering (KGQA) has attracted attention as a means to reduce LLM hallucinations and to leverage knowledge beyond the training data. However, existing KGQA benchmark datasets are biased toward encyclopedic knowledge, limited to a single modality, and lack fine-grained spatiotemporal data, which limits their applicability to real-world scenarios targeted by Embodied AI. We introduce HOME-KGQA, a novel KGQA benchmark dataset built on a multimodal KG of daily household activities. HOME-KGQA consists of complex, multi-hop natural language questions paired with graph database query languages. Compared to existing benchmarks, it includes more challenging questions that involve multi-level spatiotemporal reasoning, multimodal grounding, and aggregate functions. Experimental results show that the LLM-based KGQA methods fail to achieve performance comparable to that on existing datasets when evaluated on HOME-KGQA. This highlights significant challenges that should be addressed for the real-world deployment of KGQA systems. Our dataset is available at https://github.com/aistairc/home-kgqa

Problem

Research questions and friction points this paper is trying to address.

Knowledge Graph Question Answering

Multimodal

Spatiotemporal Reasoning

Embodied AI

Benchmark Dataset

Innovation

Methods, ideas, or system contributions that make the work stand out.

multimodal knowledge graph

knowledge graph question answering

spatiotemporal reasoning