TISDiSS: A Training-Time and Inference-Time Scalable Framework for Discriminative Source Separation

📅 2025-09-19

📈 Citations: 0

✨ Influential: 0

career value

191K/year

🤖 AI Summary

Audio source separation—encompassing speech, music, and general audio—faces a fundamental trade-off between performance and computational cost. This paper introduces the first discriminative separation framework that is scalable *both* during training and inference, enabling flexible speed–accuracy trade-offs via dynamic adjustment of inference depth—*without* retraining. Our key contributions are: (1) an early-fork multi-loss supervision architecture that provides fine-grained gradient guidance; and (2) a parameter-sharing backbone with a dynamic inference repetition mechanism, ensuring parameter efficiency and performance continuity under depth scaling. Evaluated on standard speech separation benchmarks, our method achieves state-of-the-art (SOTA) performance with fewer parameters, reduces training cost by 32%, and cuts real-time inference latency by 47%. These gains significantly enhance deployment adaptability and energy efficiency.

Technology Category

Application Category

📝 Abstract

Source separation is a fundamental task in speech, music, and audio processing, and it also provides cleaner and larger data for training generative models. However, improving separation performance in practice often depends on increasingly large networks, inflating training and deployment costs. Motivated by recent advances in inference-time scaling for generative modeling, we propose Training-Time and Inference-Time Scalable Discriminative Source Separation (TISDiSS), a unified framework that integrates early-split multi-loss supervision, shared-parameter design, and dynamic inference repetitions. TISDiSS enables flexible speed-performance trade-offs by adjusting inference depth without retraining additional models. We further provide systematic analyses of architectural and training choices and show that training with more inference repetitions improves shallow-inference performance, benefiting low-latency applications. Experiments on standard speech separation benchmarks demonstrate state-of-the-art performance with a reduced parameter count, establishing TISDiSS as a scalable and practical framework for adaptive source separation.

Problem

Research questions and friction points this paper is trying to address.

Scalable framework for discriminative source separation

Reducing training and deployment costs

Flexible speed-performance trade-offs

Innovation

Methods, ideas, or system contributions that make the work stand out.

Early-split multi-loss supervision

Shared-parameter design architecture

Dynamic inference repetition scaling

🔎 Similar Papers

No similar papers found.