🤖 AI Summary
Sound source separation (SSS) has evolved significantly over three decades, yet a comprehensive, cross-domain historical analysis remains lacking. Method: This paper conducts a systematic literature review of SSS advancements from the 1990s to the 2020s, covering speech, music, and general audio in both single- and multi-channel settings. It integrates bibliometric analysis, critical evaluation of benchmarking frameworks (e.g., BSS_EVAL, MUSDB, LibriMix), and longitudinal trend analysis. Contribution/Results: We propose the first unified, cross-paradigm developmental taxonomy—comprising four stages: Blind Source Separation → Deep Learning → Interpretable Modeling → Embodied Interaction. Crucially, we identify how evaluation culture—encompassing metrics, datasets, and challenges—has actively shaped technical trajectories. We define three core challenges for next-generation SSS: causal interpretability, low-resource generalization, and human–machine collaborative separation. This work establishes an authoritative domain benchmark for ICASSP’s 50th anniversary and informs the transition toward post-deep-learning research paradigms.
📝 Abstract
Source separation (SS) of acoustic signals is a research field that emerged in the mid-1990s and has flourished ever since. On the occasion of ICASSP's 50th anniversary, we review the major contributions and advancements in the past three decades in the speech, audio, and music SS research field. We will cover both single- and multi-channel SS approaches. We will also look back on key efforts to foster a culture of scientific evaluation in the research field, including challenges, performance metrics, and datasets. We will conclude by discussing current trends and future research directions.