Retrievit: In-context Retrieval Capabilities of Transformers, State Space Models, and Hybrid Architectures

📅 2026-03-03

📈 Citations: 0

✨ Influential: 0

career value

177K/year

🤖 AI Summary

This study investigates how to enhance model performance on contextual retrieval tasks—particularly those involving long-sequence modeling and positional reasoning—while maintaining computational efficiency. To this end, the authors design two synthetic benchmarks: n-gram retrieval and position retrieval, which enable systematic evaluation of Transformers, state space models (SSMs), and their hybrid architectures across data efficiency, length extrapolation, out-of-distribution robustness, and representation learning. The work reveals, for the first time, that SSMs and hybrid models can develop interpretable embedding structures with local awareness, matching or even surpassing Transformers in information-dense retrieval scenarios. Furthermore, it establishes clear architectural preferences: hybrid models outperform pure SSMs and match or exceed Transformers on n-gram retrieval, whereas Transformers retain an advantage in position-based retrieval tasks.

Technology Category

Application Category

📝 Abstract

Transformers excel at in-context retrieval but suffer from quadratic complexity with sequence length, while State Space Models (SSMs) offer efficient linear-time processing but have limited retrieval capabilities. We investigate whether hybrid architectures combining Transformers and SSMs can achieve the best of both worlds on two synthetic in-context retrieval tasks. The first task, n-gram retrieval, requires the model to identify and reproduce an n-gram that succeeds the query within the input sequence. The second task, position retrieval, presents the model with a single query token and requires it to perform a two-hop associative lookup: first locating the corresponding element in the sequence, and then outputting its positional index. Under controlled experimental conditions, we assess data efficiency, length generalization, robustness to out of domain training examples, and learned representations across Transformers, SSMs, and hybrid architectures. We find that hybrid models outperform SSMs and match or exceed Transformers in data efficiency and extrapolation for information-dense context retrieval. However, Transformers maintain superiority in position retrieval tasks. Through representation analysis, we discover that SSM-based models develop locality-aware embeddings where tokens representing adjacent positions become neighbors in embedding space, forming interpretable structures. This emergent property, absent in Transformers, explains both the strengths and limitations of SSMs and hybrids for different retrieval tasks. Our findings provide principled guidance for architecture selection based on task requirements and reveal fundamental differences in how Transformers and SSMs, and hybrid models learn positional associations.

Problem

Research questions and friction points this paper is trying to address.

in-context retrieval

n-gram retrieval

position retrieval

sequence modeling

architectural comparison

Innovation

Methods, ideas, or system contributions that make the work stand out.

in-context retrieval

hybrid architectures

State Space Models