State Space Models are Strong Text Rerankers

📅 2024-12-18

🏛️ arXiv.org

📈 Citations: 2

✨ Influential: 0

career value

172K/year

🤖 AI Summary

This work addresses the challenges of fine-grained query-document interaction and long-context modeling in text re-ranking. It presents the first systematic evaluation of state space models (SSMs)—specifically Mamba-1 and Mamba-2—as Transformer alternatives for this task. Experiments compare them against Transformer baselines across multiple scales, pretraining objectives, and with Flash Attention optimization. Results show that Mamba-2 achieves re-ranking effectiveness on par with parameter-matched Transformers while offering O(1) inference time complexity—enabling significantly more efficient long-sequence processing. However, its end-to-end training and inference throughput remains lower than Flash Attention-accelerated Transformers. The study thus reveals both the promise and limitations of SSMs in retrieval re-ranking, providing novel empirical evidence and technical insights for designing efficient, scalable retrieval models.

Technology Category

Application Category

📝 Abstract

Transformers dominate NLP and IR; but their inference inefficiencies and challenges in extrapolating to longer contexts have sparked interest in alternative model architectures. Among these, state space models (SSMs) like Mamba offer promising advantages, particularly $O(1)$ time complexity in inference. Despite their potential, SSMs' effectiveness at text reranking, -- ,a task requiring fine-grained query-document interaction and long-context understanding, -- ,remains underexplored. This study benchmarks SSM-based architectures (specifically, Mamba-1 and Mamba-2) against transformer-based models across various scales, architectures, and pre-training objectives, focusing on performance and efficiency in text reranking tasks. We find that (1) Mamba architectures achieve competitive text ranking performance, comparable to transformer-based models of similar size; (2) they are less efficient in training and inference compared to transformers with flash attention; and (3) Mamba-2 outperforms Mamba-1 in both performance and efficiency. These results underscore the potential of state space models as a transformer alternative and highlight areas for improvement in future IR applications.

Problem

Research questions and friction points this paper is trying to address.

Evaluating SSMs' effectiveness in text reranking tasks

Comparing SSMs and transformers in performance and efficiency

Exploring Mamba architectures as transformer alternatives in IR

Innovation

Methods, ideas, or system contributions that make the work stand out.

State space models for text reranking

Mamba architectures vs transformers comparison

Mamba-2 outperforms Mamba-1 efficiency

🔎 Similar Papers

No similar papers found.