🤖 AI Summary
To address the scarcity and acquisition difficulty of real-world room impulse responses (RIRs) for speaker distance estimation, this paper introduces, for the first time, a generative data augmentation paradigm and establishes an end-to-end physically plausible RIR synthesis framework. Methodologically, it integrates deep generative models (e.g., GANs, VAEs, diffusion models) with acoustic prior constraints and differentiable acoustic rendering to produce spatially consistent and physically faithful RIRs. Key contributions include: (1) releasing the first standardized benchmark dataset and evaluation protocol dedicated to RIR generation; (2) open-sourcing the complete implementation pipeline; and (3) proposing a dual-axis evaluation metric that jointly assesses both RIR generation fidelity and downstream distance estimation performance. Experiments demonstrate that the synthesized RIRs substantially improve the generalization capability and robustness of distance estimation models.
📝 Abstract
This paper describes the synthesis of the room acoustics challenge as a part of the generative data augmentation workshop at ICASSP 2025. The challenge defines a unique generative task that is designed to improve the quantity and diversity of the room impulse responses dataset so that it can be used for spatially sensitive downstream tasks: speaker distance estimation. The challenge identifies the technical difficulty in measuring or simulating many rooms' acoustic characteristics precisely. As a solution, it proposes generative data augmentation as an alternative that can potentially be used to improve various downstream tasks. The challenge website, dataset, and evaluation code are available at https://sites.google.com/view/genda2025.