🤖 AI Summary
Existing HAR methods rely on fixed activity sets and fine-tuning, suffering from poor generalizability; zero-shot LLM-based approaches often require lossy transformations of sensor signals into text or images, compromising interpretability and limiting accuracy. This paper proposes the first zero-shot, interpretable HAR framework designed specifically for raw sensor time-series data. Our method introduces a proxy-driven hierarchical reasoning architecture that integrates an automatically constructed pairwise feature knowledge base with a multi-sensor evidence retrieval mechanism, enabling large language models to directly parse temporal patterns and generate natural-language explanations. Evaluated on eight benchmarks, our approach achieves state-of-the-art macro-F1 scores, outperforming the strongest baseline by 2.53×. Ablation studies confirm the necessity of each component. The implementation is publicly available.
📝 Abstract
Motion sensor time-series are central to human activity recognition (HAR), with applications in health, sports, and smart devices. However, existing methods are trained for fixed activity sets and require costly retraining when new behaviours or sensor setups appear. Recent attempts to use large language models (LLMs) for HAR, typically by converting signals into text or images, suffer from limited accuracy and lack verifiable interpretability. We propose ZARA, the first agent-based framework for zero-shot, explainable HAR directly from raw motion time-series. ZARA integrates an automatically derived pair-wise feature knowledge base that captures discriminative statistics for every activity pair, a multi-sensor retrieval module that surfaces relevant evidence, and a hierarchical agent pipeline that guides the LLM to iteratively select features, draw on this evidence, and produce both activity predictions and natural-language explanations. ZARA enables flexible and interpretable HAR without any fine-tuning or task-specific classifiers. Extensive experiments on 8 HAR benchmarks show that ZARA achieves SOTA zero-shot performance, delivering clear reasoning while exceeding the strongest baselines by 2.53x in macro F1. Ablation studies further confirm the necessity of each module, marking ZARA as a promising step toward trustworthy, plug-and-play motion time-series analysis. Our codes are available at https://github.com/zechenli03/ZARA.