Exploring the Capabilities of LLMs for IMU-based Fine-grained Human Activity Understanding

📅 2025-04-02
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of enabling large language models (LLMs) to recognize fine-grained human activities—specifically, air-writing letters and words—from raw inertial measurement unit (IMU) time-series data, a task where LLMs traditionally underperform due to their text-centric architecture. We propose a novel 3D→2D spatiotemporally preserving encoding scheme that transforms triaxial IMU signals into structured 2D sequences compatible with LLM input formats, thereby overcoming the fundamental bottleneck in processing non-textual sequential sensor data. Leveraging a newly curated air-writing dataset, lightweight fine-tuning, and few-shot learning, our approach achieves a 129× improvement in 2D air-letter recognition over baseline methods. Moreover, it attains 78% accuracy in recognizing 3D air-written words (≤5 characters), significantly extending the applicability of LLMs to low-level sensor understanding tasks.

Technology Category

Application Category

📝 Abstract
Human activity recognition (HAR) using inertial measurement units (IMUs) increasingly leverages large language models (LLMs), yet existing approaches focus on coarse activities like walking or running. Our preliminary study indicates that pretrained LLMs fail catastrophically on fine-grained HAR tasks such as air-written letter recognition, achieving only near-random guessing accuracy. In this work, we first bridge this gap for flat-surface writing scenarios: by fine-tuning LLMs with a self-collected dataset and few-shot learning, we achieved up to a 129x improvement on 2D data. To extend this to 3D scenarios, we designed an encoder-based pipeline that maps 3D data into 2D equivalents, preserving the spatiotemporal information for robust letter prediction. Our end-to-end pipeline achieves 78% accuracy on word recognition with up to 5 letters in mid-air writing scenarios, establishing LLMs as viable tools for fine-grained HAR.
Problem

Research questions and friction points this paper is trying to address.

Exploring LLMs for fine-grained human activity recognition using IMUs
Improving LLM performance on air-written letter recognition tasks
Extending 2D writing recognition to 3D scenarios with an encoder-based pipeline
Innovation

Methods, ideas, or system contributions that make the work stand out.

Fine-tuning LLMs with self-collected dataset
Encoder-based pipeline for 3D to 2D mapping
Few-shot learning for improved accuracy
🔎 Similar Papers
No similar papers found.