🤖 AI Summary
This paper addresses the challenge of action semantic understanding from sparse millimeter-wave (mmWave) radar point-cloud sequences—particularly critical in privacy-sensitive applications such as healthcare monitoring and smart homes. Methodologically, we propose the first end-to-end radar-perception-oriented action language generation framework, comprising: (1) a motion-guided Aggregate VQ-VAE radar tokenizer that learns low-dimensional, semantically discriminative radar representations; (2) a radar–text cross-modal alignment language model integrating deformable template modeling and masked trajectory prediction; and (3) a physics-driven radar–text synthetic pipeline to mitigate the scarcity of real-world annotated data. Evaluated on both synthetic and real-world datasets, our approach achieves state-of-the-art performance, significantly improving the accuracy and interpretability of generated action descriptions. The code and models will be publicly released.
📝 Abstract
Millimeter-wave radar provides a privacy-preserving solution for human motion analysis, yet its sparse point clouds pose significant challenges for semantic understanding. We present Radar-LLM, the first framework that leverages large language models (LLMs) for human motion understanding using millimeter-wave radar as the sensing modality. Our approach introduces two key innovations: (1) a motion-guided radar tokenizer based on our Aggregate VQ-VAE architecture that incorporates deformable body templates and masked trajectory modeling to encode spatiotemporal point clouds into compact semantic tokens, and (2) a radar-aware language model that establishes cross-modal alignment between radar and text in a shared embedding space. To address data scarcity, we introduce a physics-aware synthesis pipeline that generates realistic radar-text pairs from motion-text datasets. Extensive experiments demonstrate that Radar-LLM achieves state-of-the-art performance across both synthetic and real-world benchmarks, enabling accurate translation of millimeter-wave signals to natural language descriptions. This breakthrough facilitates comprehensive motion understanding in privacy-sensitive applications like healthcare and smart homes. We will release the full implementation to support further research on https://inowlzy.github.io/RadarLLM/.