Fantastic (small) Retrievers and How to Train Them: mxbai-edge-colbert-v0 Tech Report

📅 2025-10-16
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Motivated by the urgent need for lightweight, efficient retrieval models deployable on edge devices, this paper introduces mxbai-edge-colbert-v0—the first ultra-compact late-interaction model designed for full-spectrum retrieval (cloud-to-edge). Our approach combines knowledge distillation, architectural simplification, and multi-round ablation-driven parameter compression to yield two variants with only 17M and 32M parameters. Evaluated on BEIR and other standard benchmarks, mxbai-edge-colbert-v0 consistently outperforms ColBERTv2, especially on long-context tasks, achieving substantial gains in retrieval accuracy. Moreover, it delivers several-fold speedup in inference latency and significantly reduced memory footprint. Crucially, this work demonstrates—for the first time—that ultra-lightweight late-interaction models can retain strong contextual representation capability without compromising effectiveness. As such, mxbai-edge-colbert-v0 establishes a practical, deployable foundation for on-device semantic search.

Technology Category

Application Category

📝 Abstract
In this work, we introduce mxbai-edge-colbert-v0 models, at two different parameter counts: 17M and 32M. As part of our research, we conduct numerous experiments to improve retrieval and late-interaction models, which we intend to distill into smaller models as proof-of-concepts. Our ultimate aim is to support retrieval at all scales, from large-scale retrieval which lives in the cloud to models that can run locally, on any device. mxbai-edge-colbert-v0 is a model that we hope will serve as a solid foundation backbone for all future experiments, representing the first version of a long series of small proof-of-concepts. As part of the development of mxbai-edge-colbert-v0, we conducted multiple ablation studies, of which we report the results. In terms of downstream performance, mxbai-edge-colbert-v0 is a particularly capable small model, outperforming ColBERTv2 on common short-text benchmarks (BEIR) and representing a large step forward in long-context tasks, with unprecedented efficiency.
Problem

Research questions and friction points this paper is trying to address.

Developing efficient small retrieval models for edge devices
Improving late-interaction models through ablation studies
Enhancing performance on short-text and long-context tasks
Innovation

Methods, ideas, or system contributions that make the work stand out.

Introduces mxbai-edge-colbert-v0 with 17M and 32M parameters
Improves retrieval and late-interaction models for efficiency
Outperforms ColBERTv2 on short-text and long-context tasks
🔎 Similar Papers
No similar papers found.