GENIUS: A Generative Framework for Universal Multimodal Search

📅 2025-03-25

📈 Citations: 0

✨ Influential: 0

career value

193K/year

🤖 AI Summary

Current generative retrieval models are task-specific, exhibit poor generalization, and underperform embedding-based approaches. This paper introduces the first general-purpose generative multimodal retrieval framework that abandons continuous embeddings and directly generates target data identifiers (IDs), enabling unified cross-modal, cross-task, and cross-domain retrieval. Our method comprises three core innovations: (1) modality-decoupled semantic quantization, jointly mapping multimodal inputs to discrete IDs; (2) query interpolation augmentation, enhancing zero-shot generalization and robustness to input noise; and (3) a lightweight ID generator ensuring efficient inference. Evaluated on the M-BEIR benchmark, our approach significantly outperforms all existing generative baselines. Crucially, retrieval latency remains constant with increasing corpus size, single-stage performance matches mainstream embedding methods, and—when combined with re-ranking—it approaches the performance of state-of-the-art embedding-based systems.

Technology Category

Application Category

📝 Abstract

Generative retrieval is an emerging approach in information retrieval that generates identifiers (IDs) of target data based on a query, providing an efficient alternative to traditional embedding-based retrieval methods. However, existing models are task-specific and fall short of embedding-based retrieval in performance. This paper proposes GENIUS, a universal generative retrieval framework supporting diverse tasks across multiple modalities and domains. At its core, GENIUS introduces modality-decoupled semantic quantization, transforming multimodal data into discrete IDs encoding both modality and semantics. Moreover, to enhance generalization, we propose a query augmentation that interpolates between a query and its target, allowing GENIUS to adapt to varied query forms. Evaluated on the M-BEIR benchmark, it surpasses prior generative methods by a clear margin. Unlike embedding-based retrieval, GENIUS consistently maintains high retrieval speed across database size, with competitive performance across multiple benchmarks. With additional re-ranking, GENIUS often achieves results close to those of embedding-based methods while preserving efficiency.

Problem

Research questions and friction points this paper is trying to address.

Develops a universal generative retrieval framework for multimodal tasks

Improves performance and generalization over task-specific generative models

Maintains high retrieval speed and competitive accuracy across benchmarks

Innovation

Methods, ideas, or system contributions that make the work stand out.

Modality-decoupled semantic quantization for multimodal data

Query augmentation for improved generalization

Consistent high-speed retrieval across database sizes

🔎 Similar Papers

ACE: A Generative Cross-Modal Retrieval Framework with Coarse-To-Fine Semantic Modeling