FasTUSS: Faster Task-Aware Unified Source Separation

📅 2025-07-15
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the high computational complexity and slow inference of task-aware unified source separation (TUSS) models, this paper proposes FasTUSS—a lightweight and efficient architecture. Built upon the TF-Locoformer framework, FasTUSS incorporates structural reparameterization, module simplification, and a prompt-conditioning mechanism to yield two low-complexity variants: FasTUSS-8.3G and FasTUSS-11.7G; a causal variant is further derived to improve practical applicability. In multi-task audio separation, FasTUSS reduces computational cost by 81% and 73%, respectively, while incurring only marginal performance degradation—1.2 dB and 0.4 dB SDR loss—on mainstream benchmarks. It achieves state-of-the-art trade-offs between efficiency and accuracy. The core contribution lies in a novel lightweight modeling paradigm that jointly preserves task awareness and satisfies real-time inference constraints.

Technology Category

Application Category

📝 Abstract
Time-Frequency (TF) dual-path models are currently among the best performing audio source separation network architectures, achieving state-of-the-art performance in speech enhancement, music source separation, and cinematic audio source separation. While they are characterized by a relatively low parameter count, they still require a considerable number of operations, implying a higher execution time. This problem is exacerbated by the trend towards bigger models trained on large amounts of data to solve more general tasks, such as the recently introduced task-aware unified source separation (TUSS) model. TUSS, which aims to solve audio source separation tasks using a single, conditional model, is built upon TF-Locoformer, a TF dual-path model combining convolution and attention layers. The task definition comes in the form of a sequence of prompts that specify the number and type of sources to be extracted. In this paper, we analyze the design choices of TUSS with the goal of optimizing its performance-complexity trade-off. We derive two more efficient models, FasTUSS-8.3G and FasTUSS-11.7G that reduce the original model's operations by 81% and 73% with minor performance drops of 1.2~dB and 0.4~dB averaged over all benchmarks, respectively. Additionally, we investigate the impact of prompt conditioning to derive a causal TUSS model.
Problem

Research questions and friction points this paper is trying to address.

Optimize performance-complexity trade-off in TUSS models
Reduce operations in TF dual-path source separation networks
Investigate prompt conditioning for causal TUSS model
Innovation

Methods, ideas, or system contributions that make the work stand out.

TF dual-path model with convolution and attention
Task-aware unified source separation (TUSS) optimization
Reduced operations with minor performance drop
🔎 Similar Papers
No similar papers found.