🤖 AI Summary
Existing methods for multi-hop logical query answering over multimodal knowledge graphs (MMKGs) struggle to model fine-grained subcomponents of multimodal entities—such as image regions or text spans—hindering precise logical reasoning.
Method: This work introduces first-order logic (FOL) reasoning to MMKG queries for the first time, proposing a novel coarse-cone embedding mechanism that jointly models both coarse-grained semantics and fine-grained substructures of entities. It enables differentiable geometric computation of logical operators (conjunction, disjunction, negation), integrates multimodal feature fusion, and adopts a two-stage framework: candidate shortlist generation followed by sub-entity localization.
Contribution/Results: Our approach achieves significant improvements over state-of-the-art methods on four public MMKG benchmarks. Notably, it yields up to a 12.6% absolute gain in Hits@1 on complex queries involving negation or subcomponent answers—marking the first method capable of both precise multimodal subcomponent localization and end-to-end FOL reasoning.
📝 Abstract
Multi-hop query answering over a Knowledge Graph (KG) involves traversing one or more hops from the start node to answer a query. Path-based and logic-based methods are state-of-the-art for multi-hop question answering. The former is used in link prediction tasks. The latter is for answering complex logical queries. The logical multi-hop querying technique embeds the KG and queries in the same embedding space. The existing work incorporates First Order Logic (FOL) operators, such as conjunction ($wedge$), disjunction ($vee$), and negation ($
eg$), in queries. Though current models have most of the building blocks to execute the FOL queries, they cannot use the dense information of multi-modal entities in the case of Multi-Modal Knowledge Graphs (MMKGs). We propose RConE, an embedding method to capture the multi-modal information needed to answer a query. The model first shortlists candidate (multi-modal) entities containing the answer. It then finds the solution (sub-entities) within those entities. Several existing works tackle path-based question-answering in MMKGs. However, to our knowledge, we are the first to introduce logical constructs in querying MMKGs and to answer queries that involve sub-entities of multi-modal entities as the answer. Extensive evaluation of four publicly available MMKGs indicates that RConE outperforms the current state-of-the-art.