ViDoRe Benchmark V2: Raising the Bar for Visual Retrieval

📅 2025-05-22
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
ViDoRe V1 has reached performance saturation (nDCG@5 > 90%), limiting its ability to discriminate between models. Method: We introduce ViDoRe V2—a next-generation multilingual visual document retrieval benchmark—addressing V1’s limitations via a novel “blind-context query” paradigm, long-range cross-document retrieval, human-AI hybrid query generation, and coverage of four real-world multilingual datasets. Methodologically, V2 integrates synthetic data generation, expert human validation, multilingual alignment-aware evaluation, and a dynamic nDCG@5 testing framework. Contribution/Results: Experiments reveal persistent bottlenecks in state-of-the-art models regarding multilingual generalization and long-context understanding, confirming V2’s enhanced discriminative power and real-world relevance. As a living benchmark, ViDoRe V2 enables sustainable, iterative advancement in visual retrieval research.

Technology Category

Application Category

📝 Abstract
The ViDoRe Benchmark V1 was approaching saturation with top models exceeding 90% nDCG@5, limiting its ability to discern improvements. ViDoRe Benchmark V2 introduces realistic, challenging retrieval scenarios via blind contextual querying, long and cross-document queries, and a hybrid synthetic and human-in-the-loop query generation process. It comprises four diverse, multilingual datasets and provides clear evaluation instructions. Initial results demonstrate substantial room for advancement and highlight insights on model generalization and multilingual capability. This benchmark is designed as a living resource, inviting community contributions to maintain relevance through future evaluations.
Problem

Research questions and friction points this paper is trying to address.

Enhances visual retrieval with challenging realistic scenarios
Introduces multilingual datasets for diverse evaluation
Encourages community contributions for sustained relevance
Innovation

Methods, ideas, or system contributions that make the work stand out.

Blind contextual querying for realistic scenarios
Hybrid synthetic and human query generation
Multilingual datasets with clear evaluation
🔎 Similar Papers
No similar papers found.
Q
Quentin Mac'e
Illuin Technology
A
Ant'onio Loison
Illuin Technology
Manuel Faysse
Manuel Faysse
CentraleSupélec - Université Paris Saclay
Natural Language ProcessingMachine LearningPrivacy