🤖 AI Summary
This work investigates whether variants of mask-based beamforming (BF) exhibit peak extraction performance equivalence under a unified masked BF framework. Addressing limitations in prior studies—namely, incomplete BF coverage (e.g., exclusion of MVDR) and reliance on ideal scaling (IS)—we propose the first comprehensive, differentiable, and realistic unified framework encompassing all major BF types, including MVDR, with mask-driven differentiable scaling. For the first time, we theoretically and empirically demonstrate that mask design (e.g., IRM, IBM, PSF), time-frequency modeling, and the coupling mechanism between masking and BF critically affect extraction consistency, thereby challenging the common assumption of performance equivalence. Evaluations on WSJ0-2mix and LibriMix reveal up to 12.7 dB disparity in peak extraction error across mask–BF combinations, confirming non-equivalence. Our core contribution is a differentiable, general, and practically viable paradigm for joint mask–BF optimization.