🤖 AI Summary
To address low denoising and modulation classification accuracy under high-noise conditions, this paper proposes a multimodal masked autoencoder framework that explicitly models noise as an independent input modality. The method jointly processes noisy time-domain signals and constellation diagrams in an unsupervised pre-training stage, enabling end-to-end reconstruction of clean waveforms and noise-free constellation diagrams, while cross-modal collaborative learning jointly enhances both denoising and modulation recognition performance. Its key innovation lies in the first explicit incorporation of noise as a dedicated modality in representation learning—reducing dependency on labeled data (3% fewer fine-tuning samples) and unlabeled data (10% fewer pre-training samples), and enabling robust generalization to unseen lower-SNR regimes. The approach achieves state-of-the-art accuracy on automatic modulation recognition and demonstrates strong robustness across a wide SNR range.
📝 Abstract
We propose Denoising Masked Autoencoder (Deno-MAE), a novel multimodal autoencoder framework for denoising modulation signals during pretraining. DenoMAE extends the concept of masked autoencoders by incorporating multiple input modalities, including noise as an explicit modality, to enhance cross-modal learning and improve denoising performance. The network is pre-trained using unlabeled noisy modulation signals and constellation diagrams, effectively learning to reconstruct their equivalent noiseless signals and diagrams. Deno-MAE achieves state-of-the-art accuracy in automatic modulation classification tasks with significantly fewer training samples, demonstrating a 10% reduction in unlabeled pretraining data and a 3% reduction in labeled fine-tuning data compared to existing approaches. Moreover, our model exhibits robust performance across varying signal-to-noise ratios (SNRs) and supports extrapolation on unseen lower SNRs. The results indicate that DenoMAE is an efficient, flexible, and data-efficient solution for denoising and classifying modulation signals in challenging noise-intensive environments.