🤖 AI Summary
This work proposes LACONIC, a learnable sparse retrieval model based on Llama-3 (1B/3B/8B), to bridge the performance gap between sparse and dense retrievers while maintaining computational efficiency. Although dense retrieval models achieve strong performance, their high computational and memory demands hinder efficient deployment on CPUs; in contrast, traditional sparse models are efficient but underperform. LACONIC employs a two-stage training strategy: first, weakly supervised pre-finetuning of the causal language model to acquire bidirectional contextual representations, followed by fine-tuning with high-quality hard negatives. The LACONIC-8B variant achieves an nDCG of 60.2 on the MTEB benchmark, placing it within the top 15%, while reducing index memory usage by 71% compared to equivalent dense models. This enables highly efficient CPU-based inference without sacrificing retrieval effectiveness.
📝 Abstract
While dense retrieval models have become the standard for state-of-the-art information retrieval, their deployment is often constrained by high memory requirements and reliance on GPU accelerators for vector similarity search. Learned sparse retrieval offers a compelling alternative by enabling efficient search via inverted indices, yet it has historically received less attention than dense approaches. In this report, we introduce LACONIC, a family of learned sparse retrievers based on the Llama-3 architecture (1B, 3B, and 8B). We propose a streamlined two-phase training curriculum consisting of (1) weakly supervised pre-finetuning to adapt causal LLMs for bidirectional contextualization and (2) high-signal finetuning using curated hard negatives. Our results demonstrate that LACONIC effectively bridges the performance gap with dense models: the 8B variant achieves a state-of-the-art 60.2 nDCG on the MTEB Retrieval benchmark, ranking 15th on the leaderboard as of January 1, 2026, while utilizing 71\% less index memory than an equivalent dense model. By delivering high retrieval effectiveness on commodity CPU hardware with a fraction of the compute budget required by competing models, LACONIC provides a scalable and efficient solution for real-world search applications.