SensorLLM: Aligning Large Language Models with Motion Sensors for Human Activity Recognition

📅 2024-10-14
🏛️ arXiv.org
📈 Citations: 4
Influential: 0
📄 PDF
🤖 AI Summary
Large language models (LLMs) struggle to process raw motion sensor time-series data due to semantic sparsity, numerical input incompatibility, and computational constraints. To address this, we propose SensorLLM—a two-stage sensor-to-language alignment framework. Its core contributions are: (1) channel-specific special tokens coupled with auto-generated trend-oriented textual descriptions, enabling semantic encoding of multichannel, variable-length numeric sequences; and (2) an integrated pipeline combining textualized sequence representation, special token embedding, instruction tuning, and task-aware LoRA adaptation—enabling zero-shot human activity recognition (HAR). Evaluated across multiple benchmarks, SensorLLM achieves or surpasses state-of-the-art performance, demonstrating high accuracy, cross-device transferability, and strong generalization capability.

Technology Category

Application Category

📝 Abstract
We introduce SensorLLM, a two-stage framework that enables Large Language Models (LLMs) to perform human activity recognition (HAR) from sensor data. Despite their strong reasoning and generalization capabilities, LLMs remain underutilized for motion sensor data due to the lack of semantic context in time-series, computational constraints, and challenges in processing numerical inputs. SensorLLM addresses these limitations through a Sensor-Language Alignment stage, where we introduce special tokens for each sensor channel and automatically generate textual trend descriptions. This alignment enables LLMs to capture numerical variations, channel-specific features, and data of varying duration--without requiring human annotations. In the subsequent Task-Aware Tuning stage, we refine the model for HAR classification, achieving performance that matches or surpasses state-of-the-art methods. Our results demonstrate that SensorLLM evolves into an effective sensor learner, reasoner, and classifier through Sensor-Language Alignment, generalizing across diverse HAR datasets. We believe this work establishes a foundation for future research on time-series and text alignment, paving the way for foundation models in sensor data analysis.
Problem

Research questions and friction points this paper is trying to address.

Enables LLMs to recognize human activities from sensor data.
Addresses challenges in processing numerical sensor inputs.
Achieves state-of-the-art performance in HAR classification.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Sensor-Language Alignment with special tokens
Automatic textual trend descriptions generation
Task-Aware Tuning for HAR classification
🔎 Similar Papers
No similar papers found.