DocPruner: A Storage-Efficient Framework for Multi-Vector Visual Document Retrieval via Adaptive Patch-Level Embedding Pruning

πŸ“… 2025-09-28
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
In visual document retrieval (VDR), large vision-language models (LVLMs) adopt a multi-vector paradigm for fine-grained representation, yet require storing hundreds of patch embeddings per pageβ€”imposing prohibitive storage overhead and hindering large-scale deployment. To address this, we propose the first adaptive embedding pruning framework guided by intra-document patch attention distributions: leveraging intermediate-layer attention maps from LVLMs, our method dynamically identifies and removes redundant patch embeddings while preserving semantic fidelity. Crucially, it requires no additional training or fine-tuning and is compatible with mainstream LVLM architectures. Evaluated across 14 standard VDR benchmarks, our approach achieves 50–60% average storage compression with only a marginal mAP degradation of 0.3–0.8 percentage points. This substantially enhances deployment efficiency and scalability of LVLM-based VDR systems.

Technology Category

Application Category

πŸ“ Abstract
Visual Document Retrieval (VDR), the task of retrieving visually-rich document pages using queries that combine visual and textual cues, is crucial for numerous real-world applications. Recent state-of-the-art methods leverage Large Vision-Language Models (LVLMs) in a multi-vector paradigm, representing each document as patch-level embeddings to capture fine-grained details. While highly effective, this approach introduces a critical challenge: prohibitive storage overhead, as storing hundreds of vectors per page makes large-scale deployment costly and impractical. To address this, we introduce DocPruner, the first framework to employ adaptive patch-level embedding pruning for VDR to effectively reduce the storage overhead. DocPruner leverages the intra-document patch attention distribution to dynamically identify and discard redundant embeddings for each document. This adaptive mechanism enables a significant 50-60% reduction in storage for leading multi-vector VDR models with negligible degradation in document retrieval performance. Extensive experiments across more than ten representative datasets validate that DocPruner offers a robust, flexible, and effective solution for building storage-efficient, large-scale VDR systems.
Problem

Research questions and friction points this paper is trying to address.

Reduces storage overhead in multi-vector visual document retrieval systems
Addresses prohibitive storage costs from patch-level embedding methods
Enables scalable deployment via adaptive pruning of redundant embeddings
Innovation

Methods, ideas, or system contributions that make the work stand out.

Adaptive pruning reduces storage for multi-vector retrieval
Leverages intra-document attention to discard redundant embeddings
Achieves 50-60% storage reduction with minimal performance loss
πŸ”Ž Similar Papers
No similar papers found.