FlexIO: Flexible Single- and Multi-Channel Speech Separation and Enhancement

📅 2025-10-24

📈 Citations: 0

✨ Influential: 0

career value

208K/year

🤖 AI Summary

Existing speech separation and enhancement (SSE) systems are typically constrained by fixed numbers of speakers and microphone channels, limiting their adaptability to variable input arrays and scalable output targets. This work proposes the first end-to-end flexible-input/output SSE framework. To address the problem, we (1) introduce a prompt vector conditioning mechanism that dynamically adapts separation to 1–3 speakers; (2) design an array-agnostic inter-channel communication module to uniformly model mixtures from single- to five-channel configurations; and (3) unify single- and multi-channel SSE within a single architecture—the first such integration. Experiments on the CHiME-4 real-world dataset demonstrate robust and effective performance, significantly improving generalization to concurrent variations in microphone geometry and speaker count. The framework achieves state-of-the-art flexibility without sacrificing separation fidelity, enabling practical deployment across diverse acoustic sensing setups.

Technology Category

Application Category

📝 Abstract

Speech separation and enhancement (SSE) has advanced remarkably and achieved promising results in controlled settings, such as a fixed number of speakers and a fixed array configuration. Towards a universal SSE system, single-channel systems have been extended to deal with a variable number of speakers (i.e., outputs). Meanwhile, multi-channel systems accommodating various array configurations (i.e., inputs) have been developed. However, these attempts have been pursued separately. In this paper, we propose a flexible input and output SSE system, named FlexIO. It performs conditional separation using prompt vectors, one per speaker as a condition, allowing separation of an arbitrary number of speakers. Multi-channel mixtures are processed together with the prompt vectors via an array-agnostic channel communication mechanism. Our experiments demonstrate that FlexIO successfully covers diverse conditions with one to five microphones and one to three speakers. We also confirm the robustness of FlexIO on CHiME-4 real data.

Problem

Research questions and friction points this paper is trying to address.

Flexible speech separation for variable speaker counts

Universal system handling single/multi-channel microphone inputs

Robust enhancement under diverse real-world acoustic conditions

Innovation

Methods, ideas, or system contributions that make the work stand out.

Flexible speech separation using prompt vectors

Array-agnostic channel communication mechanism

Handles variable microphone and speaker counts

🔎 Similar Papers

No similar papers found.