MERIT: Multilingual Semantic Retrieval with Interleaved Multi-Condition Query

📅 2025-06-03
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing semantic retrieval benchmarks are constrained to monolingual, single-image, or single-condition settings, failing to model the complex, interleaved multilingual, multi-image, and multi-condition queries prevalent in real-world scenarios. To address this, we introduce MERIT—the first benchmark supporting interleaved multilingual (5 languages), multi-image, and multi-condition queries—comprising 320K queries across 7 categories. MERIT exposes a fundamental limitation of mainstream models: their neglect of fine-grained condition modeling. We propose Coral, a novel fine-tuning framework that unifies fine-grained condition preservation with holistic semantic modeling via a synergistic mechanism of embedding reconstruction and contrastive learning. Coral integrates multilingual contrastive learning, cross-modal alignment, and condition-aware attention. On MERIT, Coral achieves a 45.9% relative improvement over strong baselines and demonstrates strong generalization across eight established benchmarks, significantly enhancing accuracy and robustness for complex, multi-condition queries.

Technology Category

Application Category

📝 Abstract
Semantic retrieval is crucial for modern applications yet remains underexplored in current research. Existing datasets are limited to single languages, single images, or singular retrieval conditions, often failing to fully exploit the expressive capacity of visual information as evidenced by maintained performance when images are replaced with captions. However, practical retrieval scenarios frequently involve interleaved multi-condition queries with multiple images. Hence, this paper introduces MERIT, the first multilingual dataset for interleaved multi-condition semantic retrieval, comprising 320,000 queries with 135,000 products in 5 languages, covering 7 distinct product categories. Extensive experiments on MERIT identify existing models's limitation: focusing solely on global semantic information while neglecting specific conditional elements in queries. Consequently, we propose Coral, a novel fine-tuning framework that adapts pre-trained MLLMs by integrating embedding reconstruction to preserve fine-grained conditional elements and contrastive learning to extract comprehensive global semantics. Experiments demonstrate that Coral achieves a 45.9% performance improvement over conventional approaches on MERIT, with strong generalization capabilities validated across 8 established retrieval benchmarks. Collectively, our contributions - a novel dataset, identification of critical limitations in existing approaches, and an innovative fine-tuning framework - establish a foundation for future research in interleaved multi-condition semantic retrieval.
Problem

Research questions and friction points this paper is trying to address.

Addressing underexplored multilingual interleaved multi-condition semantic retrieval
Overcoming limitations of single-language or single-image retrieval datasets
Improving model focus on fine-grained conditional elements in queries
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multilingual dataset for multi-condition retrieval
Embedding reconstruction preserves fine-grained elements
Contrastive learning extracts global semantics
🔎 Similar Papers
No similar papers found.