The Interspeech 2026 Audio Reasoning Challenge: Evaluating Reasoning Process Quality for Audio Reasoning Models and Agents

📅 2026-02-15

📈 Citations: 0

✨ Influential: 0

career value

198K/year

🤖 AI Summary

This work addresses the lack of transparency in current large audio language models during inference, which hinders the evaluation of factual accuracy and logical coherence in their reasoning chains. To tackle this, we organized the Interspeech 2026 Audio Reasoning Challenge—the first systematic benchmark for assessing chain-of-thought reasoning quality in audio models and agents—featuring two tracks (single-model and agent-based) that attracted 156 global teams. We introduce the MMAR-Rubrics protocol for fine-grained, instance-level evaluation of reasoning chains and develop an interpretable audio reasoning framework integrating reinforcement learning, multimodal analysis, tool orchestration, and high-quality data pipelines. Experiments demonstrate that agents significantly enhance reasoning quality through iterative tool invocation and cross-modal analysis, while single models also achieve rapid improvement via reinforcement learning, collectively advancing the transparency and trustworthiness of audio reasoning systems.

Technology Category

Application Category

📝 Abstract

Recent Large Audio Language Models (LALMs) excel in understanding but often lack transparent reasoning. To address this"black-box"limitation, we organized the Audio Reasoning Challenge at Interspeech 2026, the first shared task dedicated to evaluating Chain-of-Thought (CoT) quality in the audio domain. The challenge introduced MMAR-Rubrics, a novel instance-level protocol assessing the factuality and logic of reasoning chains. Featured Single Model and Agent tracks, the competition attracting 156 teams from 18 countries and regions. Results show agent systems currently lead in reasoning quality, utilizing iterative tool orchestration and cross-modal analysis. Besides, single models are rapidly advancing via reinforcement learning and sophisticated data pipeline. We details the challenge design, methodology, and a comprehensive analysis of state-of-the-art systems, providing new insights for explainable audio intelligence.

Problem

Research questions and friction points this paper is trying to address.

Audio Reasoning

Chain-of-Thought

Explainability

Large Audio Language Models

Reasoning Transparency

Innovation

Methods, ideas, or system contributions that make the work stand out.

Chain-of-Thought

Audio Reasoning

MMAR-Rubrics

Explainable AI

Large Audio Language Models

🔎 Similar Papers

What Are They Doing? Joint Audio-Speech Co-Reasoning

2024-09-22arXiv.orgCitations: 0