A Brief Comparison of Training-Free Multi-Vector Sequence Compression Methods

📅 2026-03-23
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of large index sizes in multi-vector retrieval models, which stem from their long embedding sequences and hinder practical deployment. The study presents the first systematic evaluation of training-free token compression strategies that directly reduce the sequence dimensionality of multi-vector embeddings to lower memory overhead and query latency. By comparing token merging against token pruning, the authors demonstrate that merging achieves a superior trade-off: it substantially shrinks index size while more effectively preserving retrieval performance. These findings establish token merging as a practical and effective solution for enabling efficient multi-vector retrieval without compromising accuracy.

Technology Category

Application Category

📝 Abstract
While multi-vector retrieval models outperform single-vector models of comparable size in retrieval quality, their practicality is limited by substantially larger index sizes, driven by the additional sequence-length dimension in their document embeddings. Because document embedding size dictates both memory overhead and query latency, compression is essential for deployment. In this work, we present an evaluation of training-free methods targeting the token sequence length, a dimension unique to multi-vector retrieval. Our findings suggest that token merging is strictly superior to token pruning for reducing index size while maintaining retrieval effectiveness.
Problem

Research questions and friction points this paper is trying to address.

multi-vector retrieval
sequence compression
index size
token sequence length
retrieval efficiency
Innovation

Methods, ideas, or system contributions that make the work stand out.

training-free
multi-vector retrieval
token merging
sequence compression
index size reduction
🔎 Similar Papers
No similar papers found.
R
Rohan Jha
Johns Hopkins University, 3400 N. Charles St., Baltimore, MD 21218, USA
C
Chunsheng Zuo
Johns Hopkins University, 3400 N. Charles St., Baltimore, MD 21218, USA
Reno Kriz
Reno Kriz
Associate Research Scientist
information extractionvideo retrievaltext simplificationlarge language models
Benjamin Van Durme
Benjamin Van Durme
Johns Hopkins University / Microsoft
LinguisticsNatural Language ProcessingArtificial Intelligence