A Voronoi Cell Formulation for Principled Token Pruning in Late-Interaction Retrieval Models

📅 2026-03-10

📈 Citations: 0

✨ Influential: 0

career value

219K/year

🤖 AI Summary

This work addresses the high indexing overhead of late-interaction retrieval models such as ColBERT, which store dense embeddings for every document token. To mitigate this issue, the authors propose a token pruning method grounded in geometric estimation via Voronoi cells. For the first time, Voronoi regions in the high-dimensional embedding space are leveraged to quantify token importance by measuring each token’s influence region, thereby establishing a formal and interpretable pruning criterion. Experimental results across multiple retrieval benchmarks demonstrate that the proposed strategy substantially reduces index size while maintaining or even improving retrieval effectiveness. Furthermore, the approach provides an interpretable analytical tool for understanding token-level contributions in late-interaction architectures.

Technology Category

Application Category

📝 Abstract

Late-interaction models like ColBERT offer a competitive performance across various retrieval tasks, but require storing a dense embedding for each document token, leading to a substantial index storage overhead. Past works address this by attempting to prune low-importance token embeddings based on statistical and empirical measures, but they often either lack formal grounding or are ineffective. To address these shortcomings, we introduce a framework grounded in hyperspace geometry and cast token pruning as a Voronoi cell estimation problem in the embedding space. By interpreting each token's influence as a measure of its Voronoi region, our approach enables principled pruning that retains retrieval quality while reducing index size. Through our experiments, we demonstrate that this approach serves not only as a competitive pruning strategy but also as a valuable tool for improving and interpreting token-level behavior within dense retrieval systems.

Problem

Research questions and friction points this paper is trying to address.

late-interaction retrieval

token pruning

index storage overhead

dense embedding

retrieval models

Innovation

Methods, ideas, or system contributions that make the work stand out.

Voronoi cell

token pruning

late-interaction retrieval