Subtraction Gets You More: Gap-Aware Retrieval for Multimodal Multi-Hop QA

📅 2026-05-27
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the limitations of traditional iterative retrieval in multimodal multi-hop question answering, where semantic anchoring often leads to redundant or insufficient evidence coverage. To overcome this, the authors propose GRAIL, a novel framework that introduces an implicit query rewriting mechanism based on embedding subtraction. GRAIL integrates context-aware subtractive query guidance, adaptive implicit localization, and dynamic task routing to enable task-aware hybrid retrieval and construct a noise-robust evidence pool. By effectively mitigating semantic anchoring effects, the approach substantially broadens the search horizon. Experimental results on MultimodalQA demonstrate a 40.3% improvement in macro-average performance, underscoring the framework’s superiority and robustness in multimodal compositional reasoning.
📝 Abstract
In multimodal multi-hop question answering, we focus on the initial retrieval stage via two distinct tasks: (1) evidence set completion, retrieving missing evidence given context, and (2) sequential pool construction, iteratively building the top-$K$ pool from the scratch. Under these settings, we point out that conventional iterative retrieval frameworks often suffer from Semantic Anchoring, where previously fetched evidence traps the retriever and yields entity-centric redundancy. To break this trap, we propose GRAIL (Gap-aware Retrieval via Adaptive Implicit Localization), a paradigm that performs implicit query rewriting directly at the embedding level. By context-subtractive query steering, GRAIL excels at compositional cross-modal reasoning, while additive embedding updates show strength on localized information aggregation. By dynamically routing queries based on task type, our Hybrid Framework achieves a 40.3\% macro-averaged performance gain on MultimodalQA. Extensive evaluations demonstrate that sequential GRAIL retrieves in a superior, noise-resilient manner, significantly expanding the search horizon through iterative gap-aware optimization.
Problem

Research questions and friction points this paper is trying to address.

multimodal multi-hop QA
evidence retrieval
semantic anchoring
redundancy
gap-aware retrieval
Innovation

Methods, ideas, or system contributions that make the work stand out.

Gap-aware Retrieval
Implicit Query Rewriting
Context-Subtractive Steering
Multimodal Multi-hop QA
Semantic Anchoring