RadarLLM: Empowering Large Language Models to Understand Human Motion from Millimeter-wave Point Cloud Sequence

📅 2025-04-14

📈 Citations: 0

✨ Influential: 0

career value

256K/year

🤖 AI Summary

This paper addresses the challenge of action semantic understanding from sparse millimeter-wave (mmWave) radar point-cloud sequences—particularly critical in privacy-sensitive applications such as healthcare monitoring and smart homes. Methodologically, we propose the first end-to-end radar-perception-oriented action language generation framework, comprising: (1) a motion-guided Aggregate VQ-VAE radar tokenizer that learns low-dimensional, semantically discriminative radar representations; (2) a radar–text cross-modal alignment language model integrating deformable template modeling and masked trajectory prediction; and (3) a physics-driven radar–text synthetic pipeline to mitigate the scarcity of real-world annotated data. Evaluated on both synthetic and real-world datasets, our approach achieves state-of-the-art performance, significantly improving the accuracy and interpretability of generated action descriptions. The code and models will be publicly released.

Technology Category

Application Category

📝 Abstract

Millimeter-wave radar provides a privacy-preserving solution for human motion analysis, yet its sparse point clouds pose significant challenges for semantic understanding. We present Radar-LLM, the first framework that leverages large language models (LLMs) for human motion understanding using millimeter-wave radar as the sensing modality. Our approach introduces two key innovations: (1) a motion-guided radar tokenizer based on our Aggregate VQ-VAE architecture that incorporates deformable body templates and masked trajectory modeling to encode spatiotemporal point clouds into compact semantic tokens, and (2) a radar-aware language model that establishes cross-modal alignment between radar and text in a shared embedding space. To address data scarcity, we introduce a physics-aware synthesis pipeline that generates realistic radar-text pairs from motion-text datasets. Extensive experiments demonstrate that Radar-LLM achieves state-of-the-art performance across both synthetic and real-world benchmarks, enabling accurate translation of millimeter-wave signals to natural language descriptions. This breakthrough facilitates comprehensive motion understanding in privacy-sensitive applications like healthcare and smart homes. We will release the full implementation to support further research on https://inowlzy.github.io/RadarLLM/.

Problem

Research questions and friction points this paper is trying to address.

Enhancing human motion understanding via millimeter-wave radar sparse point clouds

Bridging radar signals and natural language with large language models

Addressing data scarcity for radar-text alignment in privacy-sensitive applications

Innovation

Methods, ideas, or system contributions that make the work stand out.

Motion-guided radar tokenizer with deformable templates

Radar-aware LLM for cross-modal alignment

Physics-aware synthesis for realistic radar-text pairs

🔎 Similar Papers

Talk2Radar: Bridging Natural Language with 4D mmWave Radar for 3D Referring Expression Comprehension