AudioRAG: A Challenging Benchmark for Audio Reasoning and Information Retrieval

📅 2026-02-11
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses a critical gap in the evaluation of language-audio models (LALMs), whose current benchmarks are confined to internal knowledge reasoning and fail to reflect real-world scenarios requiring external information retrieval. To bridge this gap, we propose the first agent framework that integrates audio understanding with retrieval-augmented generation (RAG) and introduce the first audio question-answering benchmark that supports external retrieval, comprising high-quality, automatically generated and human-verified QA pairs. Experimental results demonstrate that state-of-the-art LALMs perform substantially worse on this new task, highlighting the limitations of existing approaches. Our framework not only fills this evaluation void but also establishes a stronger baseline system, thereby laying a foundation for future research in retrieval-augmented audio-language modeling.

Technology Category

Application Category

📝 Abstract
Due to recent advancements in Large Audio-Language Models (LALMs) that demonstrate remarkable performance across a range of sound-, speech- and music-related tasks, there is a growing interest in proposing benchmarks to assess these models. Existing benchmarks generally focus only on reasoning with internal knowledge, neglecting real-world scenarios that require external information grounding. To bridge this gap, we introduce AudioRAG, a novel benchmark designed to evaluate audio-based reasoning augmented by information retrieval in realistic web environments. This benchmark comprises both LLM-generated and manually curated question-answer pairs. Our evaluations reveal that even the state-of-the-art LALMs struggle to answer these questions. We therefore propose an agentic pipeline that integrates audio reasoning with retrieval-augmented generation, providing a stronger baseline for future research.
Problem

Research questions and friction points this paper is trying to address.

Audio Reasoning
Information Retrieval
Large Audio-Language Models
Benchmark
External Knowledge Grounding
Innovation

Methods, ideas, or system contributions that make the work stand out.

AudioRAG
retrieval-augmented generation
Large Audio-Language Models
audio reasoning
information retrieval
🔎 Similar Papers
No similar papers found.