WirelessSenseLLM: Zero-Shot Human Activity Understanding by Bridging Wireless Signals and Human Language

📅 2026-05-13

📈 Citations: 0

✨ Influential: 0

career value

199K/year

🤖 AI Summary

This work addresses the challenge of zero-shot understanding of unsegmented human activities using channel state information (CSI), a task hindered by the reliance of existing methods on precise signal segmentation and predefined action labels. To overcome this limitation, the authors propose a language-driven zero-shot sensing framework that leverages a CSI-to-Language adapter and a cross-modal projection mechanism to map raw temporal CSI features end-to-end into a semantic space aligned with large language models, enabling direct generation of fine-grained natural language descriptions. This approach is the first to achieve zero-shot human activity understanding without requiring segmented training data, supporting overlapping action disentanglement and language-based reasoning while effectively bridging the modality gap and handling ambiguous action boundaries. Experiments demonstrate 92% accuracy and 91% F1 score in zero-shot action recognition, with 30% and 15% improvements in factual correctness and reasoning capability of generated language, respectively, and an average 12.33% performance gain over existing methods in multi-person activity interpretation.

📝 Abstract

There is growing interest in enabling wireless sensing systems to interpret human motion from unsegmented wireless signals; however, existing CSI-based applications rely heavily on accurate signal segmentation and predefined action labels, limiting their applicability in zero-shot scenarios. We present WirelessSenseLLM, a language-driven framework that leverages large language models (LLMs) to enable zero-shot human motion understanding from unsegmented Wi-Fi Channel State Information (CSI). To bridge the modality gap between time-series CSI and discrete language representations, we introduce a CSI-to-Language Adapter and a cross-modal projection mechanism that maps CSI features into a language-aligned semantic space. This design enables the generation of fine-grained natural language descriptions of sequential and overlapping human motions, supporting downstream reasoning without segmented training data. We address two core technical challenges: modality mismatch between CSI features and language embeddings, and overlapping actions in unsegmented CSI streams. Extensive experiments demonstrate strong performance in zero-shot action understanding (92% accuracy and 91% F1-score), language-based reasoning quality (30% factual and 15% reasoning improvements), and multi-person motion explanation with an average 12.33% improvement over prior methods. These results highlight WirelessSenseLLM's effectiveness for robust and interpretable human motion understanding from CSI signals.

Problem

Research questions and friction points this paper is trying to address.

zero-shot activity recognition

wireless sensing

unsegmented CSI

modality gap

overlapping actions

Innovation

Methods, ideas, or system contributions that make the work stand out.

Zero-shot learning

Wireless sensing

Large Language Models (LLMs)