Sculpting the Vector Space: Towards Efficient Multi-Vector Visual Document Retrieval via Prune-then-Merge Framework

📅 2026-02-23

📈 Citations: 0

✨ Influential: 0

career value

233K/year

🤖 AI Summary

This work addresses the high computational and storage costs of existing multi-vector approaches in visual document retrieval, which struggle to maintain high fidelity under aggressive compression. To overcome this limitation, the authors propose a two-stage Prune-then-Merge framework: first adaptively pruning low-information image patches, then hierarchically merging the retained high-signal embeddings. This approach uniquely integrates pruning and merging into a coordinated two-stage pipeline, effectively mitigating feature dilution caused by noise in single-stage methods and substantially expanding the feasible boundary of near-lossless compression. Extensive experiments across 29 visual document retrieval datasets demonstrate that the proposed method maintains robust performance even at high compression ratios, consistently outperforming current state-of-the-art techniques.

Technology Category

Application Category

📝 Abstract

Visual Document Retrieval (VDR), which aims to retrieve relevant pages within vast corpora of visually-rich documents, is of significance in current multimodal retrieval applications. The state-of-the-art multi-vector paradigm excels in performance but suffers from prohibitive overhead, a problem that current efficiency methods like pruning and merging address imperfectly, creating a difficult trade-off between compression rate and feature fidelity. To overcome this dilemma, we introduce Prune-then-Merge, a novel two-stage framework that synergizes these complementary approaches. Our method first employs an adaptive pruning stage to filter out low-information patches, creating a refined, high-signal set of embeddings. Subsequently, a hierarchical merging stage compresses this pre-filtered set, effectively summarizing semantic content without the noise-induced feature dilution seen in single-stage methods. Extensive experiments on 29 VDR datasets demonstrate that our framework consistently outperforms existing methods, significantly extending the near-lossless compression range and providing robust performance at high compression ratios.

Problem

Research questions and friction points this paper is trying to address.

Visual Document Retrieval

Multi-Vector

Efficiency

Compression

Feature Fidelity

Innovation

Methods, ideas, or system contributions that make the work stand out.

Prune-then-Merge

multi-vector retrieval

adaptive pruning