SIGIR 2025 -- LiveRAG Challenge Report

📅 2025-07-07

📈 Citations: 0

✨ Influential: 0

career value

216K/year

🤖 AI Summary

This work addresses the joint optimization of retrieval and prompting strategies in RAG systems. Leveraging a unified corpus (Fineweb-10BT) and an open-source model (Falcon3-10B-Instruct), it organized a timed question-answering competition involving 70 international teams. A two-stage evaluation paradigm—combining LLM-as-a-judge with human verification—was introduced to enable fine-grained, reproducible assessment of answer correctness and faithfulness for the first time. Systematic evaluation on 500 unseen questions benchmarked the generalization capabilities of diverse RAG architectures and identified key design principles for efficient retrieval and robust prompting. The study, published at SIGIR 2025, establishes a standardized empirical benchmark for RAG research, accompanied by an open dataset and a fully reproducible evaluation protocol.

Technology Category

Application Category

📝 Abstract

The LiveRAG Challenge at SIGIR 2025, held between March and May 2025, provided a competitive platform for advancing Retrieval-Augmented Generation (RAG) technologies. Participants from academia and industry were invited to develop a RAG-based question-answering system using a fixed corpus (Fineweb-10BT) and a common open-source LLM (Falcon3-10B-Instruct). The goal was to facilitate challenging comparisons of retrieval and prompting strategies. During the Live Challenge Day, 70 teams from 27 different countries provided answers and supportive information to 500 unseen questions within a strict two-hour time window. Evaluation was conducted in two stages: first an automated LLM-as-a-judge approach was used to compute correctness and faithfulness score, then a manual review of top ranked submissions was conducted. The finalists were announced on June 12, 2025, with prizes awarded during the LiveRAG Workshop at SIGIR 2025 in Padua, Italy.

Problem

Research questions and friction points this paper is trying to address.

Advancing Retrieval-Augmented Generation (RAG) technologies competitively

Comparing retrieval and prompting strategies in RAG systems

Evaluating correctness and faithfulness of RAG-based question-answering

Innovation

Methods, ideas, or system contributions that make the work stand out.

Used Fineweb-10BT corpus for RAG system

Employed Falcon3-10B-Instruct open-source LLM

Applied LLM-as-a-judge evaluation approach

🔎 Similar Papers

FlashRAG: A Modular Toolkit for Efficient Retrieval-Augmented Generation Research