Hybrid-Vector Retrieval for Visually Rich Documents: Combining Single-Vector Efficiency and Multi-Vector Accuracy

📅 2025-10-25

📈 Citations: 0

✨ Influential: 0

career value

206K/year

🤖 AI Summary

To address the trade-off between efficiency and accuracy in visually rich document retrieval—where single-vector methods are efficient but inaccurate, while multi-vector methods are accurate yet computationally expensive—this paper proposes HEAVEN, a two-stage hybrid vector framework. In the first stage, a visual summary page enables efficient single-vector coarse retrieval. In the second stage, language-importance-guided dynamic token filtering and multi-vector re-ranking enhance fine-grained matching precision. HEAVEN innovatively integrates the strengths of both paradigms. Evaluated on ViMDOC—the first benchmark for long and multi-document visually rich document retrieval—HEAVEN achieves a Recall@1 of 99.87% relative to state-of-the-art multi-vector models, while reducing per-query computational cost by 99.82%.

Technology Category

Application Category

📝 Abstract

Retrieval over visually rich documents is essential for tasks such as legal discovery, scientific search, and enterprise knowledge management. Existing approaches fall into two paradigms: single-vector retrieval, which is efficient but coarse, and multi-vector retrieval, which is accurate but computationally expensive. To address this trade-off, we propose HEAVEN, a two-stage hybrid-vector framework. In the first stage, HEAVEN efficiently retrieves candidate pages using a single-vector method over Visually-Summarized Pages (VS-Pages), which assemble representative visual layouts from multiple pages. In the second stage, it reranks candidates with a multi-vector method while filtering query tokens by linguistic importance to reduce redundant computations. To evaluate retrieval systems under realistic conditions, we also introduce ViMDOC, the first benchmark for visually rich, multi-document, and long-document retrieval. Across four benchmarks, HEAVEN attains 99.87% of the Recall@1 performance of multi-vector models on average while reducing per-query computation by 99.82%, achieving efficiency and accuracy. Our code and datasets are available at: https://github.com/juyeonnn/HEAVEN

Problem

Research questions and friction points this paper is trying to address.

Balancing retrieval efficiency and accuracy for visually rich documents

Reducing computational costs in multi-vector retrieval systems

Creating benchmark for realistic multi-document visual retrieval

Innovation

Methods, ideas, or system contributions that make the work stand out.

Hybrid-vector framework combining single and multi-vector retrieval

Two-stage process with efficient candidate retrieval and reranking

Filtering query tokens by linguistic importance to reduce computation

🔎 Similar Papers

No similar papers found.