AudioRouter: Data Efficient Audio Understanding via RL based Dual Reasoning

📅 2026-02-11

📈 Citations: 0

✨ Influential: 0

career value

210K/year

🤖 AI Summary

This work addresses the limited performance of large audio language models on fine-grained auditory perception tasks and the data inefficiency of conventional approaches that internalize perceptual capabilities through extensive training. The authors propose AudioRouter, a reinforcement learning–based framework that formulates external audio tool invocation as an explicit decision-making problem. By employing a lightweight routing policy, AudioRouter dynamically determines when and how to call specialized audio tools while keeping the main model’s parameters frozen, thereby enhancing its auditory understanding without full internalization of perceptual functions. This approach dramatically improves data efficiency, achieving substantially superior performance over baseline methods on standard audio understanding benchmarks while requiring up to 600 times less training data.

Technology Category

Application Category

📝 Abstract

Large Audio Language Models (LALMs) have demonstrated strong capabilities in audio understanding and reasoning. However, their performance on fine grained auditory perception remains unreliable, and existing approaches largely rely on data intensive training to internalize perceptual abilities. We propose AudioRouter, a reinforcement learning framework that enables LALMs to improve audio understanding by learning when and how to use external audio tools. Rather than tightly coupling tool usage with audio reasoning, AudioRouter formulates tool use as an explicit decision making problem and optimizes a lightweight routing policy while keeping the underlying reasoning model frozen. Experimental results show that AudioRouter achieves substantial improvements on standard audio understanding benchmarks while requiring up to 600x less training data to learn tool usage compared with conventional training paradigms. These findings suggest that learning effective tool usage offers a data efficient and scalable alternative to internalizing perceptual abilities in LALMs.

Problem

Research questions and friction points this paper is trying to address.

Audio Understanding

Large Audio Language Models

Data Efficiency

Fine-grained Auditory Perception

Tool Usage

Innovation

Methods, ideas, or system contributions that make the work stand out.

AudioRouter

reinforcement learning

tool usage