No More K-means:Single-Stage Sparse Coding for Efficient Multi-Vector Retrieval

📅 2026-05-28

📈 Citations: 0

✨ Influential: 0

career value

202K/year

🤖 AI Summary

This work addresses the high indexing latency, semantic information loss, and limited storage and retrieval efficiency inherent in existing multi-vector retrieval models that rely on K-means clustering. To overcome these limitations, the paper proposes Single-Stage Sparse Retrieval (SSR), a novel approach that introduces sparse autoencoders into multi-vector retrieval for the first time. SSR directly maps token embeddings to high-dimensional sparse representations, eliminating the conventional clustering step and enabling end-to-end efficient retrieval through inverted indexing. Evaluated on the BEIR benchmark, SSR reduces indexing time by 15× and cuts query latency by 50% compared to prior methods, while substantially outperforming state-of-the-art baselines in retrieval effectiveness.

📝 Abstract

Multi-vector retrieval (MVR) models, exemplified by ColBERT, have established new benchmarks in retrieval accuracy by preserving fine-grained token-level interactions. However, this granularity imposes prohibitive storage and retrieval efficiency bottlenecks: to manage the immense memory footprint and computational overhead of billion-scale token vectors, state-of-the-art systems are forced to rely on aggressive dimension reduction and complex clustering (e.g., K-means). This compromise introduces two critical limitations: excessive indexing latency of clustering large-scale corpora and semantic information loss inherent to compression. In this paper, we propose Single-stage Sparse Retrieval (SSR}, a paradigm shift that replaces expensive clustering with efficient sparse coding. Instead of compressing features into low-dimensional dense vectors, we utilize Sparse Autoencoder (SAE) to project token embeddings into a high-dimensional but highly sparse representation. This transformation enables us to bypass vector clustering entirely and leverage inverted indexing for precise, high-throughput retrieval. Extensive experiments on the BEIR benchmark demonstrate that SSR achieves a "trifecta" of improvements: it reduces indexing time by 15x compared to ColBERTv2, halves retrieval latency, and simultaneously improves retrieval performance over leading baselines.

Problem

Research questions and friction points this paper is trying to address.

multi-vector retrieval

storage efficiency

retrieval latency

semantic information loss

clustering overhead

Innovation

Methods, ideas, or system contributions that make the work stand out.

Sparse Coding

Multi-Vector Retrieval

Inverted Indexing