🤖 AI Summary
This study addresses the high acquisition cost and scarcity of labeled data for radar micro-Doppler spectrograms. We propose a purely data-driven cross-modal synthesis method that directly generates corresponding radar time-frequency spectrograms from human motion capture (MoCap) sequences. To our knowledge, this is the first work to employ a Transformer architecture for end-to-end time-series-to-image modeling of radar signals, jointly capturing joint spatial topology and motion dynamics via sliding windows and spatiotemporal self-attention—without relying on physical radar simulations or prior domain knowledge. Experimental results demonstrate that the synthesized spectrograms achieve superior visual fidelity, higher PSNR and SSIM scores, and improved generalization in downstream tasks compared to conventional physics-based models, while reducing computational overhead significantly. The approach effectively enables data augmentation and low-resource deployment for edge IoT radar systems.
📝 Abstract
We present a pure machine learning process for synthesizing radar spectrograms from Motion-Capture (MoCap) data. We formulate MoCap-to-spectrogram translation as a windowed sequence-to-sequence task using a transformer-based model that jointly captures spatial relations among MoCap markers and temporal dynamics across frames. Real-world experiments show that the proposed approach produces visually and quantitatively plausible doppler radar spectrograms and achieves good generalizability. Ablation experiments show that the learned model includes both the ability to convert multi-part motion into doppler signatures and an understanding of the spatial relations between different parts of the human body. The result is an interesting example of using transformers for time-series signal processing. It is especially applicable to edge computing and Internet of Things (IoT) radars. It also suggests the ability to augment scarce radar datasets using more abundant MoCap data for training higher-level applications. Finally, it requires far less computation than physics-based methods for generating radar data.