🤖 AI Summary
This work addresses the poor generalization and high retraining burden of millimeter-wave radar-based human activity recognition in real-world scenarios, where annotation costs are prohibitive and domain shifts are prevalent. To overcome these challenges, the authors propose a training-free deployment framework that reformulates activity recognition as an evidence-based reasoning process grounded in a reusable radar knowledge base. This knowledge base is constructed via cross-modal semantic transfer, and recognition is performed through similarity retrieval in a physically aligned, explicit kinematic space. The framework further incorporates multi-agent structured reasoning and a zero-gradient self-evolution mechanism. Without requiring fine-tuning or retraining on target domains, the method achieves 93.39% accuracy on a self-collected dataset, demonstrating substantially improved cross-domain generalization.
📝 Abstract
Millimeter-wave (mmWave) radar enables privacy-preserving human activity recognition (HAR), yet real-world deployment remains hindered by costly annotation and poor transferability under domain shift. Although prior efforts partially alleviate these challenges, most still require retraining or adaptation for each new deployment setting. This keeps mmWave HAR in a repeated collect-tune-redeploy cycle, making scalable real-world deployment difficult. In this paper, we present RAGent, a deployment-time training-free framework for mmWave HAR that reformulates recognition as evidence-grounded inference over reusable radar knowledge rather than deployment-specific model optimization. Offline, RAGent constructs a reusable radar knowledge base through constrained cross-modal supervision, where a Vision-Language Model (VLM) transfers activity semantics from synchronized videos to paired radar segments without manual radar annotation. At deployment time, RAGent recognizes activities from radar alone by retrieving physically comparable precedents in an explicit kinematic space and resolving the final label through structured multi-role reasoning. The reasoning protocol is further refined offline through zero-gradient self-evolution. Extensive experiments on a self-collected dataset show that RAGent achieves 93.39% accuracy without per-domain retraining or target-domain adaptation, while generalizing robustly across domains.