SIGIR 2025 -- LiveRAG Challenge Report

📅 2025-07-07
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the joint optimization of retrieval and prompting strategies in RAG systems. Leveraging a unified corpus (Fineweb-10BT) and an open-source model (Falcon3-10B-Instruct), it organized a timed question-answering competition involving 70 international teams. A two-stage evaluation paradigm—combining LLM-as-a-judge with human verification—was introduced to enable fine-grained, reproducible assessment of answer correctness and faithfulness for the first time. Systematic evaluation on 500 unseen questions benchmarked the generalization capabilities of diverse RAG architectures and identified key design principles for efficient retrieval and robust prompting. The study, published at SIGIR 2025, establishes a standardized empirical benchmark for RAG research, accompanied by an open dataset and a fully reproducible evaluation protocol.

Technology Category

Application Category

📝 Abstract
The LiveRAG Challenge at SIGIR 2025, held between March and May 2025, provided a competitive platform for advancing Retrieval-Augmented Generation (RAG) technologies. Participants from academia and industry were invited to develop a RAG-based question-answering system using a fixed corpus (Fineweb-10BT) and a common open-source LLM (Falcon3-10B-Instruct). The goal was to facilitate challenging comparisons of retrieval and prompting strategies. During the Live Challenge Day, 70 teams from 27 different countries provided answers and supportive information to 500 unseen questions within a strict two-hour time window. Evaluation was conducted in two stages: first an automated LLM-as-a-judge approach was used to compute correctness and faithfulness score, then a manual review of top ranked submissions was conducted. The finalists were announced on June 12, 2025, with prizes awarded during the LiveRAG Workshop at SIGIR 2025 in Padua, Italy.
Problem

Research questions and friction points this paper is trying to address.

Advancing Retrieval-Augmented Generation (RAG) technologies competitively
Comparing retrieval and prompting strategies in RAG systems
Evaluating correctness and faithfulness of RAG-based question-answering
Innovation

Methods, ideas, or system contributions that make the work stand out.

Used Fineweb-10BT corpus for RAG system
Employed Falcon3-10B-Instruct open-source LLM
Applied LLM-as-a-judge evaluation approach
🔎 Similar Papers
No similar papers found.
D
David Carmel
Technology Innovation Institute (TII), Haifa, Israel
Simone Filice
Simone Filice
Technology Innovation Institute (TII)
Natural Language ProcessingInformation RetrievalMachine Learning
G
Guy Horowitz
Technology Innovation Institute (TII), Haifa, Israel
Yoelle Maarek
Yoelle Maarek
Chief Researcher AI/IR, TII
Oren Somekh
Oren Somekh
Technology Innovation Institute
Recommendation SystemsOnline AdvertisingMachine LearningLLM RAG
R
Ran Tavory
Technology Innovation Institute (TII), Haifa, Israel