🤖 AI Summary
This work addresses the challenge of developing a highly general-purpose multilingual and multimodal text embedding representation capable of uniformly supporting diverse downstream tasks—including classification, retrieval, clustering, similarity estimation, and ranking. To this end, we propose the first Gemini-based universal embedding system, which leverages prompt engineering, representation distillation, multi-task supervised fine-tuning, and large-scale multilingual contrastive learning to transform Gemini’s strong multilingual and code comprehension capabilities into pre-computable, plug-and-play embedding vectors. The system supports over 250 languages and code corpora. Evaluated on the MMTEB benchmark—comprising 100+ tasks—it consistently outperforms domain-specific models across languages, tasks, and modalities, achieving state-of-the-art performance in a unified setting. This marks the first empirical validation of the feasibility and superiority of a single-large-model-driven universal embedding paradigm.
📝 Abstract
In this report, we introduce Gemini Embedding, a state-of-the-art embedding model leveraging the power of Gemini, Google's most capable large language model. Capitalizing on Gemini's inherent multilingual and code understanding capabilities, Gemini Embedding produces highly generalizable embeddings for text spanning numerous languages and textual modalities. The representations generated by Gemini Embedding can be precomputed and applied to a variety of downstream tasks including classification, similarity, clustering, ranking, and retrieval. Evaluated on the Massive Multilingual Text Embedding Benchmark (MMTEB), which includes over one hundred tasks across 250+ languages, Gemini Embedding substantially outperforms prior state-of-the-art models, demonstrating considerable improvements in embedding quality. Achieving state-of-the-art performance across MMTEB's multilingual, English, and code benchmarks, our unified model demonstrates strong capabilities across a broad selection of tasks and surpasses specialized domain-specific models.