PolyBench: A Benchmark for Compositional Reasoning in Polyphonic Audio

📅 2026-03-05
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the absence of effective evaluation benchmarks for compositional reasoning about concurrent sound events and their relationships in polyphonic audio. To this end, we present PolyBench, the first systematically constructed benchmark comprising five subtasks—counting, classification, detection, concurrency judgment, and duration estimation—to comprehensively assess models’ compositional reasoning capabilities. Experiments on prominent large audio language models reveal a significant performance drop in polyphonic scenarios, underscoring the challenge and validity of PolyBench while exposing critical limitations of current models in handling complex audio understanding tasks involving multiple simultaneous sounds.

Technology Category

Application Category

📝 Abstract
Large Audio Language Models (LALMs) are increasingly capable of reasoning over audio. However, existing benchmarks provide limited coverage of reasoning in polyphonic audio, where multiple sound events co-occur and induce compositional structure. In this work, we introduce PolyBench, a benchmark designed to evaluate compositional reasoning in polyphonic audio. PolyBench comprises five evaluation subsets covering counting, classification, detection, concurrency, and duration estimation, requiring reasoning over multiple concurrent events and their relations. Evaluation of state-of-the-art LALMs reveals consistent performance degradation in polyphonic audio, indicating a fundamental bottleneck in current LALMs.
Problem

Research questions and friction points this paper is trying to address.

compositional reasoning
polyphonic audio
audio language models
benchmark
concurrent sound events
Innovation

Methods, ideas, or system contributions that make the work stand out.

compositional reasoning
polyphonic audio
audio language models
benchmark
concurrent sound events
🔎 Similar Papers
2024-09-15arXiv.orgCitations: 0