🤖 AI Summary
This work addresses the longstanding trade-off between efficiency and effectiveness in multilingual information retrieval (IR) embedding models. We introduce the Granite family of embedding models—comprising 12-layer dense/sparse base models and 6-layer distilled lightweight variants—supporting both English and multilingual retrieval. Our approach innovatively integrates retrieval-oriented pretraining, contrastive fine-tuning, knowledge distillation, and model ensembling, jointly optimized on enterprise-grade, high-quality, compliant data to balance accuracy, inference efficiency, and regulatory requirements. On IBM’s internal retrieval benchmarks, Granite significantly outperforms open-source models of comparable size; it achieves state-of-the-art (SOTA) results on standard IR benchmarks including MSMARCO and BEIR. All Granite models are released under the Apache 2.0 license, providing an efficient, production-ready, and trustworthy open-source foundation for industrial multilingual IR systems.
📝 Abstract
We introduce the Granite Embedding models, a family of encoder-based embedding models designed for retrieval tasks, spanning dense-retrieval and sparse retrieval architectures, with both English and Multilingual capabilities. This report provides the technical details of training these highly effective 12 layer embedding models, along with their efficient 6 layer distilled counterparts. Extensive evaluations show that the models, developed with techniques like retrieval oriented pretraining, contrastive finetuning, knowledge distillation, and model merging significantly outperform publicly available models of similar sizes on both internal IBM retrieval and search tasks, and have equivalent performance on widely used information retrieval benchmarks, while being trained on high-quality data suitable for enterprise use. We publicly release all our Granite Embedding models under the Apache 2.0 license, allowing both research and commercial use at https://huggingface.co/collections/ibm-granite.