MAB-DQA: Addressing Query Aspect Importance in Document Question Answering with Multi-Armed Bandits

📅 2026-04-10
📈 Citations: 0
Influential: 0
📄 PDF

career value

160K/year
🤖 AI Summary
Existing multimodal document question answering systems often retain only a small number of candidate pages during retrieval, thereby overlooking information-rich yet visually non-salient content critical to accurate answers. This work proposes the first integration of a multi-armed bandit mechanism into this task, modeling the importance of multiple implicit aspects within the query and decomposing it into aspect-aware subqueries. By dynamically allocating retrieval budgets to prioritize high-value aspects, the method synergistically combines retrieval-augmented generation with an exploration–exploitation strategy. Evaluated on four benchmarks, the approach outperforms current state-of-the-art methods by 5%–18% on average, significantly enhancing the utilization of non-salient yet essential content and overall question answering performance.

Technology Category

Application Category

📝 Abstract
Document Question Answering (DQA) involves generating answers from a document based on a user's query, representing a key task in document understanding. This task requires interpreting visual layouts, which has prompted recent studies to adopt multimodal Retrieval-Augmented Generation (RAG) that processes page images for answer generation. However, in multimodal RAG, visual DQA struggles to utilize a large number of images effectively, as the retrieval stage often retains only a few candidate pages (e.g., Top-4), causing informative but less visually salient content to be overlooked in favor of common yet low-information pages. To address this issue, we propose a Multi-Armed Bandit-based DQA framework (MAB-DQA) to explicitly model the varying importance of multiple implicit aspects in a query. Specifically, MAB-DQA decomposes a query into aspect-aware subqueries and retrieves an aspect-specific candidate set for each. It treats each subquery as an arm and uses preliminary reasoning results from a small number of representative pages as reward signals to estimate aspect utility. Guided by an exploration-exploitation policy, MAB-DQA dynamically reallocates retrieval budgets toward high-value aspects. With the most informative pages and their correlations, MAB-DQA generates the expected results. On four benchmarks, MAB-DQA shows an average improvement of 5%-18% over the state-of-the-art method, consistently enhancing document understanding. Code at https://github.com/ElephantOH/MAB-DQA.
Problem

Research questions and friction points this paper is trying to address.

Document Question Answering
Multimodal RAG
Query Aspect Importance
Retrieval Budget Allocation
Visual Layout Understanding
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-Armed Bandits
Document Question Answering
Aspect-aware Retrieval
Multimodal RAG
Dynamic Budget Allocation