RConE: Rough Cone Embedding for Multi-Hop Logical Query Answering on Multi-Modal Knowledge Graphs

📅 2024-08-21

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

145K/year

🤖 AI Summary

Existing methods for multi-hop logical query answering over multimodal knowledge graphs (MMKGs) struggle to model fine-grained subcomponents of multimodal entities—such as image regions or text spans—hindering precise logical reasoning. Method: This work introduces first-order logic (FOL) reasoning to MMKG queries for the first time, proposing a novel coarse-cone embedding mechanism that jointly models both coarse-grained semantics and fine-grained substructures of entities. It enables differentiable geometric computation of logical operators (conjunction, disjunction, negation), integrates multimodal feature fusion, and adopts a two-stage framework: candidate shortlist generation followed by sub-entity localization. Contribution/Results: Our approach achieves significant improvements over state-of-the-art methods on four public MMKG benchmarks. Notably, it yields up to a 12.6% absolute gain in Hits@1 on complex queries involving negation or subcomponent answers—marking the first method capable of both precise multimodal subcomponent localization and end-to-end FOL reasoning.

Technology Category

Application Category

📝 Abstract

Multi-hop query answering over a Knowledge Graph (KG) involves traversing one or more hops from the start node to answer a query. Path-based and logic-based methods are state-of-the-art for multi-hop question answering. The former is used in link prediction tasks. The latter is for answering complex logical queries. The logical multi-hop querying technique embeds the KG and queries in the same embedding space. The existing work incorporates First Order Logic (FOL) operators, such as conjunction ($wedge$), disjunction ($vee$), and negation ($ eg$), in queries. Though current models have most of the building blocks to execute the FOL queries, they cannot use the dense information of multi-modal entities in the case of Multi-Modal Knowledge Graphs (MMKGs). We propose RConE, an embedding method to capture the multi-modal information needed to answer a query. The model first shortlists candidate (multi-modal) entities containing the answer. It then finds the solution (sub-entities) within those entities. Several existing works tackle path-based question-answering in MMKGs. However, to our knowledge, we are the first to introduce logical constructs in querying MMKGs and to answer queries that involve sub-entities of multi-modal entities as the answer. Extensive evaluation of four publicly available MMKGs indicates that RConE outperforms the current state-of-the-art.

Problem

Research questions and friction points this paper is trying to address.

Answering multi-hop logical queries on multi-modal knowledge graphs

Incorporating multi-modal entity information in logical query processing

Handling sub-entities within multi-modal entities as query answers

Innovation

Methods, ideas, or system contributions that make the work stand out.

Rough cone embedding for multi-modal graphs

Shortlists candidate entities with multi-modal data

Finds sub-entities within multi-modal candidate entities

🔎 Similar Papers

FiDeLiS: Faithful Reasoning in Large Language Model for Knowledge Graph Question Answering