MotionTeller: Multi-modal Integration of Wearable Time-Series with LLMs for Health and Behavioral Understanding

📅 2025-12-24
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of directly translating minute-level wearable accelerometer time-series signals into natural language behavioral summaries without fine-tuning large language model (LLM) backbones. The proposed end-to-end framework maps raw sensor signals to the LLM token space via a lightweight projection module and trains a frozen decoder-only LLM (e.g., Llama, Phi) under pure text supervision. Key contributions include: (1) a parameter-efficient wearable-to-token projection architecture; (2) the first large-scale, real-world activity–text paired dataset—NHANES—comprising 54,000 aligned samples; and (3) a training protocol enabling convergence in just 15 epochs with final loss of 0.38. Quantitative evaluation yields BERTScore-F1 = 0.924 and ROUGE-1 = 0.722, outperforming prompt-engineering baselines by 7%. Qualitative analysis confirms accurate modeling of circadian patterns and behavioral transitions.

Technology Category

Application Category

📝 Abstract
As wearable sensing becomes increasingly pervasive, a key challenge remains: how can we generate natural language summaries from raw physiological signals such as actigraphy - minute-level movement data collected via accelerometers? In this work, we introduce MotionTeller, a generative framework that natively integrates minute-level wearable activity data with large language models (LLMs). MotionTeller combines a pretrained actigraphy encoder with a lightweight projection module that maps behavioral embeddings into the token space of a frozen decoder-only LLM, enabling free-text, autoregressive generation of daily behavioral summaries. We construct a novel dataset of 54383 (actigraphy, text) pairs derived from real-world NHANES recordings, and train the model using cross-entropy loss with supervision only on the language tokens. MotionTeller achieves high semantic fidelity (BERTScore-F1 = 0.924) and lexical accuracy (ROUGE-1 = 0.722), outperforming prompt-based baselines by 7 percent in ROUGE-1. The average training loss converges to 0.38 by epoch 15, indicating stable optimization. Qualitative analysis confirms that MotionTeller captures circadian structure and behavioral transitions, while PCA plots reveal enhanced cluster alignment in embedding space post-training. Together, these results position MotionTeller as a scalable, interpretable system for transforming wearable sensor data into fluent, human-centered descriptions, introducing new pathways for behavioral monitoring, clinical review, and personalized health interventions.
Problem

Research questions and friction points this paper is trying to address.

Generates natural language summaries from raw wearable activity data
Integrates minute-level physiological signals with large language models
Transforms sensor data into human-centered behavioral descriptions
Innovation

Methods, ideas, or system contributions that make the work stand out.

Pretrained actigraphy encoder maps minute-level movement data
Lightweight projection module aligns embeddings with LLM token space
Generates free-text daily summaries via frozen decoder-only LLM
🔎 Similar Papers
No similar papers found.
A
Aiwei Zhang
Center for Technology and Behavioral Health, Geisel School of Medicine, Dartmouth College, Lebanon, NH, United States
A
Arvind Pillai
Department of Computer Science, Dartmouth College, Hanover , NH, United States
A
Andrew Campbell
Department of Computer Science, Dartmouth College, Hanover , NH, United States
Nicholas C. Jacobson
Nicholas C. Jacobson
Dartmouth College
Digital PhenotypingDigital InterventionsArtificial IntelligenceMental HealthChatbots