🤖 AI Summary
Current generative retrieval models are task-specific, exhibit poor generalization, and underperform embedding-based approaches. This paper introduces the first general-purpose generative multimodal retrieval framework that abandons continuous embeddings and directly generates target data identifiers (IDs), enabling unified cross-modal, cross-task, and cross-domain retrieval. Our method comprises three core innovations: (1) modality-decoupled semantic quantization, jointly mapping multimodal inputs to discrete IDs; (2) query interpolation augmentation, enhancing zero-shot generalization and robustness to input noise; and (3) a lightweight ID generator ensuring efficient inference. Evaluated on the M-BEIR benchmark, our approach significantly outperforms all existing generative baselines. Crucially, retrieval latency remains constant with increasing corpus size, single-stage performance matches mainstream embedding methods, and—when combined with re-ranking—it approaches the performance of state-of-the-art embedding-based systems.
📝 Abstract
Generative retrieval is an emerging approach in information retrieval that generates identifiers (IDs) of target data based on a query, providing an efficient alternative to traditional embedding-based retrieval methods. However, existing models are task-specific and fall short of embedding-based retrieval in performance. This paper proposes GENIUS, a universal generative retrieval framework supporting diverse tasks across multiple modalities and domains. At its core, GENIUS introduces modality-decoupled semantic quantization, transforming multimodal data into discrete IDs encoding both modality and semantics. Moreover, to enhance generalization, we propose a query augmentation that interpolates between a query and its target, allowing GENIUS to adapt to varied query forms. Evaluated on the M-BEIR benchmark, it surpasses prior generative methods by a clear margin. Unlike embedding-based retrieval, GENIUS consistently maintains high retrieval speed across database size, with competitive performance across multiple benchmarks. With additional re-ranking, GENIUS often achieves results close to those of embedding-based methods while preserving efficiency.