Context-Aware Query Refinement for Target Sound Extraction: Handling Partially Matched Queries

πŸ“… 2025-09-10
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
In target sound extraction (TSE), performance of existing methods degrades significantly under partial match queries (PMQs)β€”queries containing both active and inactive acoustic classes present in the mixture. To address this, we propose a context-aware query optimization framework that jointly models acoustic event activity and query-audio semantic alignment, enabling dynamic identification and removal of inactive classes from the query for adaptive refinement. This is the first TSE method to explicitly model the PMQ scenario and mitigate its adverse effects. Experiments demonstrate robust performance across fully matching, fully mismatching, and partially matching queries. Our approach achieves substantial gains over state-of-the-art methods on CHiME-5 and AudioSet-TSE benchmarks, with relative improvements of up to 12.7% specifically under PMQ conditions.

Technology Category

Application Category

πŸ“ Abstract
Target sound extraction (TSE) is the task of extracting a target sound specified by a query from an audio mixture. Much prior research has focused on the problem setting under the Fully Matched Query (FMQ) condition, where the query specifies only active sounds present in the mixture. However, in real-world scenarios, queries may include inactive sounds that are not present in the mixture. This leads to scenarios such as the Fully Unmatched Query (FUQ) condition, where only inactive sounds are specified in the query, and the Partially Matched Query (PMQ) condition, where both active and inactive sounds are specified. Among these conditions, the performance degradation under the PMQ condition has been largely overlooked. To achieve robust TSE under the PMQ condition, we propose context-aware query refinement. This method eliminates inactive classes from the query during inference based on the estimated sound class activity. Experimental results demonstrate that while conventional methods suffer from performance degradation under the PMQ condition, the proposed method effectively mitigates this degradation and achieves high robustness under diverse query conditions.
Problem

Research questions and friction points this paper is trying to address.

Addresses performance degradation in partially matched queries
Proposes context-aware refinement for inactive sound elimination
Enhances robustness in target sound extraction scenarios
Innovation

Methods, ideas, or system contributions that make the work stand out.

Context-aware query refinement method
Eliminates inactive classes from query
Estimates sound class activity automatically
πŸ”Ž Similar Papers
No similar papers found.
R
Ryo Sato
RION Co., Ltd., Tokyo, Japan
C
Chiho Haruta
RION Co., Ltd., Tokyo, Japan
N
Nobuhiko Hiruma
RION Co., Ltd., Tokyo, Japan
Keisuke Imoto
Keisuke Imoto
Kyoto University
Acoustic Signal ProcessingSound Event Detection