🤖 AI Summary
This work proposes a novel approach to late interaction retrieval by integrating an attention mechanism into the ColBERT framework to better capture fine-grained relevance signals between query and document terms. Unlike existing late interaction models, which treat term-level similarities uniformly, our method explicitly models the importance weights between individual terms and uses them to perform a weighted aggregation of local similarity scores, thereby enhancing semantic relevance modeling. By combining pretrained language model embeddings, late interaction, and learnable attention weights, the proposed architecture achieves significant improvements in recall across multiple retrieval benchmarks, including MS-MARCO, BEIR, and LoTTE, demonstrating both its effectiveness and generalizability.
📝 Abstract
Vector embeddings from pre-trained language models form a core component in Neural Information Retrieval systems across a multitude of knowledge extraction tasks. The paradigm of late interaction, introduced in ColBERT, demonstrates high accuracy along with runtime efficiency. However, the current formulation fails to take into account the attention weights of query and document terms, which intuitively capture the "importance" of similarities between them, that might lead to a better understanding of relevance between the queries and documents. This work proposes ColBERT-Att, to explicitly integrate attention mechanism into the late interaction framework for enhanced retrieval performance. Empirical evaluation of ColBERT-Att depicts improvements in recall accuracy on MS-MARCO as well as on a wide range of BEIR and LoTTE benchmark datasets.