Granite Embedding R2 Models

📅 2025-08-26
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the challenge of balancing performance and efficiency for enterprise-scale dense retrieval across heterogeneous domains—including text, code, long documents, multi-turn dialogues, and tables—this paper introduces a high-performance English encoder embedding model family. Methodologically, it employs a dual-encoder–cross-encoder collaborative architecture with 22-layer and 12-layer variants, trained exclusively on compliant enterprise data; supports 8,192-token contexts (16× context expansion) and incorporates inference-path optimizations to jointly maximize accuracy and latency efficiency. Contributions include consistent state-of-the-art results across standard benchmarks, IBM’s internal evaluation suites, and production deployment scenarios; open-sourcing under the Apache 2.0 license; and delivering three specialized models—efficient dual-encoders, cross-encoders, and re-rankers—that significantly advance long-context understanding and multimodal content representation capabilities.

Technology Category

Application Category

📝 Abstract
We introduce the Granite Embedding R2 models, a comprehensive family of high-performance English encoder-based embedding models engineered for enterprise-scale dense retrieval applications. Building upon our first-generation release, these models deliver substantial improvements, including 16x expanded context length (8,192 tokens), state-of-the-art performance across diverse retrieval domains - text, code, long-document search, multi-turn conversational, and tabular data - and measurable speed advantages of 19-44% over leading competitors while maintaining superior accuracy. Our release encompasses both bi-encoder and cross-encoder architectures, featuring a highly effective 22-layer retriever model and its efficient 12-layer counterpart, alongside a high-quality reranker model, all trained exclusively on enterprise-appropriate data with comprehensive governance oversight. The models demonstrate exceptional versatility across standard benchmarks, IBM-developed evaluation suites, and real-world enterprise use cases, establishing new performance standards for open-source embedding models. In an era where retrieval speed and accuracy are paramount for competitive advantage, the Granite R2 models deliver a compelling combination of cutting-edge performance, enterprise-ready licensing, and transparent data provenance that organizations require for mission-critical deployments. All models are publicly available under the Apache 2.0 license at https://huggingface.co/collections/ibm-granite, enabling unrestricted research and commercial use.
Problem

Research questions and friction points this paper is trying to address.

Develop high-performance embedding models for enterprise retrieval
Achieve state-of-the-art accuracy with significant speed improvements
Provide versatile retrieval capabilities across diverse data types
Innovation

Methods, ideas, or system contributions that make the work stand out.

Expanded context length to 8192 tokens
Combined bi-encoder and cross-encoder architectures
Enterprise data training with governance oversight
🔎 Similar Papers
No similar papers found.