Linear-Time and Constant-Memory Text Embeddings Based on Recurrent Language Models

📅 2026-04-20

📈 Citations: 0

✨ Influential: 0

career value

189K/year

🤖 AI Summary

Transformer-based embedding models face significant challenges in computational complexity and memory consumption when processing long texts. This work proposes a general vertical chunking inference method tailored for recurrent language models such as Mamba2, RWKV, and xLSTM, achieving linear time complexity and constant memory usage when input sequences exceed the chunk size. Combined with a fine-tuning strategy, the approach attains performance on par with Transformers across multiple embedding benchmarks while substantially reducing memory overhead. These results demonstrate the effectiveness and competitiveness of recurrent architectures for efficient text embedding generation.

Technology Category

Application Category

📝 Abstract

Transformer-based embedding models suffer from quadratic computational and linear memory complexity, limiting their utility for long sequences. We propose recurrent architectures as an efficient alternative, introducing a vertically chunked inference strategy that enables fast embedding generation with memory usage that becomes constant in the input length once it exceeds the vertical chunk size. By fine-tuning Mamba2 models, we demonstrate their viability as general-purpose text embedders, achieving competitive performance across a range of benchmarks while maintaining a substantially smaller memory footprint compared to transformer-based counterparts. We empirically validate the applicability of our inference strategy to Mamba2, RWKV, and xLSTM models, confirming consistent runtime-memory trade-offs across architectures and establishing recurrent models as a compelling alternative to transformers for efficient embedding generation.

Problem

Research questions and friction points this paper is trying to address.

text embeddings

long sequences

computational complexity

memory efficiency

transformer limitations

Innovation

Methods, ideas, or system contributions that make the work stand out.

recurrent language models

constant-memory embedding

vertically chunked inference