🤖 AI Summary
This work addresses the challenges of fine-grained query-document interaction and long-context modeling in text re-ranking. It presents the first systematic evaluation of state space models (SSMs)—specifically Mamba-1 and Mamba-2—as Transformer alternatives for this task. Experiments compare them against Transformer baselines across multiple scales, pretraining objectives, and with Flash Attention optimization. Results show that Mamba-2 achieves re-ranking effectiveness on par with parameter-matched Transformers while offering O(1) inference time complexity—enabling significantly more efficient long-sequence processing. However, its end-to-end training and inference throughput remains lower than Flash Attention-accelerated Transformers. The study thus reveals both the promise and limitations of SSMs in retrieval re-ranking, providing novel empirical evidence and technical insights for designing efficient, scalable retrieval models.
📝 Abstract
Transformers dominate NLP and IR; but their inference inefficiencies and challenges in extrapolating to longer contexts have sparked interest in alternative model architectures. Among these, state space models (SSMs) like Mamba offer promising advantages, particularly $O(1)$ time complexity in inference. Despite their potential, SSMs' effectiveness at text reranking, -- ,a task requiring fine-grained query-document interaction and long-context understanding, -- ,remains underexplored. This study benchmarks SSM-based architectures (specifically, Mamba-1 and Mamba-2) against transformer-based models across various scales, architectures, and pre-training objectives, focusing on performance and efficiency in text reranking tasks. We find that (1) Mamba architectures achieve competitive text ranking performance, comparable to transformer-based models of similar size; (2) they are less efficient in training and inference compared to transformers with flash attention; and (3) Mamba-2 outperforms Mamba-1 in both performance and efficiency. These results underscore the potential of state space models as a transformer alternative and highlight areas for improvement in future IR applications.