🤖 AI Summary
This work addresses the challenge of intuitive visualization of WiFi channel state information (CSI) in through-wall non-line-of-sight (NLoS) scenarios. We propose the first cross-modal generation framework mapping raw CSI to visual images. Methodologically, we design a multimodal variational autoencoder (VAE) tailored to through-wall radio propagation characteristics, jointly modeling time-frequency CSI features and incorporating RF physics-informed priors to enable end-to-end reconstruction of human activity heatmaps or silhouettes. Experiments demonstrate that reconstructed images clearly capture human contours and dynamic motion structures; both qualitative inspection and quantitative evaluation confirm effectiveness, while ablation studies validate the critical role of each architectural component. To our knowledge, this is the first work achieving interpretable, CSI-driven through-wall visual perception—establishing a novel paradigm for camera-free indoor monitoring and enabling image-level downstream tasks.
📝 Abstract
This work presents a seminal approach for synthesizing images from WiFi Channel State Information (CSI) in through-wall scenarios. Leveraging the strengths of WiFi, such as cost-effectiveness, illumination invariance, and wall-penetrating capabilities, our approach enables visual monitoring of indoor environments beyond room boundaries and without the need for cameras. More generally, it improves the interpretability of WiFi CSI by unlocking the option to perform image-based downstream tasks, e.g., visual activity recognition. In order to achieve this crossmodal translation from WiFi CSI to images, we rely on a multimodal Variational Autoencoder (VAE) adapted to our problem specifics. We extensively evaluate our proposed methodology through an ablation study on architecture configuration and a quantitative/qualitative assessment of reconstructed images. Our results demonstrate the viability of our method and highlight its potential for practical applications.