🤖 AI Summary
This work proposes an end-to-end, data-driven approach for separating mixed signals corrupted by unknown non-Gaussian interference. Built upon an enhanced SoundStream architecture, the method employs finite scalar quantization (FSQ) to produce discrete representations of target signals and integrates an augmented Transformer decoder to achieve adaptive separation and zero-shot generalization—without requiring prior knowledge or side information about the interference. The model is optimized end-to-end via cross-entropy loss and demonstrates substantial performance gains over existing methods on the MIT RF Challenge dataset. Notably, it achieves a 122-fold reduction in bit error rate for QPSK signals under 5G interference, highlighting its robustness and effectiveness in challenging real-world scenarios.
📝 Abstract
We study a problem of signal separation: estimating a signal of interest (SOI) contaminated by an unknown non-Gaussian background/interference. Given the training data consisting of examples of SOI and interference, we show how to build a fully data-driven signal separator. To that end we learn a good discrete tokenizer for SOI and then train an end-to-end transformer on a cross-entropy loss. Training with a cross-entropy shows substantial improvements over the conventional mean-squared error (MSE). Our tokenizer is a modification of Google's SoundStream, which incorporates additional transformer layers and switches from VQVAE to finite-scalar quantization (FSQ). Across real and synthetic mixtures from the MIT RF Challenge dataset, our method achieves competitive performance, including a 122x reduction in bit-error rate (BER) over prior state-of-the-art techniques for separating a QPSK signal from 5G interference. The learned representation adapts to the interference type without side information and shows zero-shot generalization to unseen mixtures at inference time, underscoring its potential beyond RF. Although we instantiate our approach on radio-frequency mixtures, we expect the same architecture to apply to gravitational-wave data (e.g., LIGO strain) and other scientific sensing problems that require data-driven modeling of background and noise.