LSM-2: Learning from Incomplete Wearable Sensor Data

📅 2025-06-05
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Wearable sensor data frequently exhibit high proportions of non-random missingness, severely undermining self-supervised learning (SSL) performance. To address this, we propose LSM-2, a novel SSL framework that eliminates the need for imputation. LSM-2 introduces Adaptive Inheritance-based Masking (AIM), the first mechanism to jointly model natural missingness and artificial masking—enabling explicit awareness of real-world missing patterns and robust representation learning. It incorporates learnable mask tokens to support multimodal temporal modeling and is pretrained at scale on 40 million hours of real-world sensor data. Evaluated across classification, regression, and generative tasks, LSM-2 consistently outperforms state-of-the-art methods. Notably, it maintains strong generalization and inference consistency under high missingness rates and clinically relevant scenarios—e.g., nighttime signal-based hypertension prediction—thereby substantially enhancing clinical deployability and reliability.

Technology Category

Application Category

📝 Abstract
Foundation models, a cornerstone of recent advancements in machine learning, have predominantly thrived on complete and well-structured data. Wearable sensor data frequently suffers from significant missingness, posing a substantial challenge for self-supervised learning (SSL) models that typically assume complete data inputs. This paper introduces the second generation of Large Sensor Model (LSM-2) with Adaptive and Inherited Masking (AIM), a novel SSL approach that learns robust representations directly from incomplete data without requiring explicit imputation. AIM's core novelty lies in its use of learnable mask tokens to model both existing ("inherited") and artificially introduced missingness, enabling it to robustly handle fragmented real-world data during inference. Pre-trained on an extensive dataset of 40M hours of day-long multimodal sensor data, our LSM-2 with AIM achieves the best performance across a diverse range of tasks, including classification, regression and generative modeling. Furthermore, LSM-2 with AIM exhibits superior scaling performance, and critically, maintains high performance even under targeted missingness scenarios, reflecting clinically coherent patterns, such as the diagnostic value of nighttime biosignals for hypertension prediction. This makes AIM a more reliable choice for real-world wearable data applications.
Problem

Research questions and friction points this paper is trying to address.

Handling incomplete wearable sensor data for self-supervised learning
Learning robust representations without explicit data imputation
Maintaining performance under clinically coherent missingness patterns
Innovation

Methods, ideas, or system contributions that make the work stand out.

Adaptive and Inherited Masking for incomplete data
Learnable mask tokens model missingness
Pre-trained on 40M hours sensor data
🔎 Similar Papers
No similar papers found.
M
Maxwell A. Xu
University of Illinois Urbana-Champaign, Google Research
Girish Narayanswamy
Girish Narayanswamy
UbiComp Lab, University of Washington
Health SensingSignal ProcessingMachine LearningArtificial IntelligenceEmbedded Systems
Kumar Ayush
Kumar Ayush
Google | Stanford University | Indian Institute of Technology Kharagpur
Foundation ModelsLarge Language ModelsGenerative AIRLHF
Dimitris Spathis
Dimitris Spathis
Google Research and University of Cambridge
machine learningself-supervised learningmultimodal learninghuman-centered AIhealth sensing
S
Shun Liao
Google Research
Shyam A. Tailor
Shyam A. Tailor
Senior Research Scientist, Google
Machine LearningWearable DevicesMobile Technology
A
Ahmed Metwally
Google Research
A
A. Ali Heydari
Google Research
Y
Yuwei Zhang
Google Research
J
Jake Garrison
Google Research
S
Samy Abdel-Ghaffar
Google Research
Xuhai Xu
Xuhai Xu
Assistant Professor, Columbia University | Google
Human-Computer InteractionUbiquitous ComputingHuman-Centered AImHealthHealth Informatics
Ken Gu
Ken Gu
Paul G. Allen School of Computer Science & Engineering, University of Washington
Data ScienceNatural Language ProcessingHuman-Computer Interaction
J
Jacob Sunshine
Google Research
Ming-Zher Poh
Ming-Zher Poh
Google, MIT
machine learningphysiological sensingwearable sensorsmobile healthcomputational physiology
Y
Yun Liu
Google Research
Tim Althoff
Tim Althoff
Associate Professor of Computer Science, University of Washington
Human AI InteractionNatural Language ProcessingBehavioral Data ScienceAI for Mental Health
S
Shrikanth Narayanan
Google DeepMind
Pushmeet Kohli
Pushmeet Kohli
DeepMind
AI for ScienceMachine LearningAI SafetyComputer VisionProgram Synthesis
M
Mark Malhotra
Google Research
S
Shwetak N. Patel
Google Research
Y
Yuzhe Yang
Google Research
James M. Rehg
James M. Rehg
Founder Professor of Computer Science, University of Illinois at Urbana-Champaign
computer visionroboticsmachine learninghuman-computer interactionparallel and distributed
X
Xin Liu
Google Research
Daniel McDuff
Daniel McDuff
Google and University of Washington
Affective ComputingDeep LearningHuman-Computer InteractionHuman-Centered AIComputer Vision