Generalist vs Specialist Time Series Foundation Models: Investigating Potential Emergent Behaviors in Assessing Human Health Using PPG Signals

📅 2025-10-15
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study systematically evaluates the performance disparity between general-purpose and domain-specific foundation models for photoplethysmography (PPG)-driven human health assessment across 51 clinically relevant tasks—including cardiac state identification, laboratory biomarker prediction, and cross-modal reasoning. Methodologically, it introduces the first comprehensive benchmark spanning seven dimensions: model architecture, data efficiency, feature quality, attention mechanism, transferability, robustness, and generalization. Employing full fine-tuning alongside multidimensional evaluation—win-rate scoring, attention visualization, and feature separability analysis—it demonstrates that domain-specific models achieve a 27% higher win rate overall, exhibiting superior physiological signal modeling capability, stability, and clinical adaptability. The core contribution lies in quantifying the critical gains conferred by domain-specialized design for PPG-based health inference and elucidating the synergistic interplay between data curation and architectural choices in governing generalization performance.

Technology Category

Application Category

📝 Abstract
Foundation models are large-scale machine learning models that are pre-trained on massive amounts of data and can be adapted for various downstream tasks. They have been extensively applied to tasks in Natural Language Processing and Computer Vision with models such as GPT, BERT, and CLIP. They are now also increasingly gaining attention in time-series analysis, particularly for physiological sensing. However, most time series foundation models are specialist models - with data in pre-training and testing of the same type, such as Electrocardiogram, Electroencephalogram, and Photoplethysmogram (PPG). Recent works, such as MOMENT, train a generalist time series foundation model with data from multiple domains, such as weather, traffic, and electricity. This paper aims to conduct a comprehensive benchmarking study to compare the performance of generalist and specialist models, with a focus on PPG signals. Through an extensive suite of total 51 tasks covering cardiac state assessment, laboratory value estimation, and cross-modal inference, we comprehensively evaluate both models across seven dimensions, including win score, average performance, feature quality, tuning gain, performance variance, transferability, and scalability. These metrics jointly capture not only the models' capability but also their adaptability, robustness, and efficiency under different fine-tuning strategies, providing a holistic understanding of their strengths and limitations for diverse downstream scenarios. In a full-tuning scenario, we demonstrate that the specialist model achieves a 27% higher win score. Finally, we provide further analysis on generalization, fairness, attention visualizations, and the importance of training data choice.
Problem

Research questions and friction points this paper is trying to address.

Compares generalist vs specialist models for PPG-based health assessment
Evaluates 51 tasks across cardiac monitoring and cross-modal inference
Assesses model adaptability, robustness and efficiency in healthcare applications
Innovation

Methods, ideas, or system contributions that make the work stand out.

Compares generalist and specialist time series models
Evaluates models using PPG signals for health assessment
Assesses performance across seven different evaluation dimensions
🔎 Similar Papers
No similar papers found.
S
Saurabh Kataria
Nell Hodgson Woodruff School of Nursing, Emory University
Y
Yi Wu
School of Computer Science, University of Oklahoma
Zhaoliang Chen
Zhaoliang Chen
Postdoc, Hong Kong Baptist University (24-26); Ph.D. & B.E., Fuzhou University (15-24)
Machine learninggraph neural networksmulti-view learninglow-rank approximationmatrix
H
Hyunjung Gloria Kwak
Nell Hodgson Woodruff School of Nursing, Emory University
Y
Yuhao Xu
Department of Computer Science, Emory University
L
Lovely Yeswanth Panchumarthi
Department of Computer Science, Emory University
R
Ran Xiao
Nell Hodgson Woodruff School of Nursing, Emory University
Jiaying Lu
Jiaying Lu
Research Assistant Professor of School of Nursing's Center for Data Science, at Emory University
AI for HealthcareKnowledge GraphMultimodal LearningLarge Language Model
Ayca Ermis
Ayca Ermis
Georgia Institute of Technology
Systems ControlPredictive Modeling
A
Anni Zhao
Nell Hodgson Woodruff School of Nursing, Emory University
Runze Yan
Runze Yan
Emory University
Digital HealthMachine Learning and Data Mining
A
Alex Federov
Nell Hodgson Woodruff School of Nursing, Emory University
Zewen Liu
Zewen Liu
Emory University
Machine LearningGraph Neural NetworksEpidemic Modeling
X
Xu Wu
School of Computer Science, University of Oklahoma
W
Wei Jin
Department of Computer Science, Emory University
Carl Yang
Carl Yang
Waymo LLC, PhD at University of California, Davis
GPU ComputingParallel ComputingGraph Processing
Jocelyn Grunwell
Jocelyn Grunwell
Emory University School of Medicine & Children's healthcare of Atlanta at Egleston
Pediatric Intensive CareAcute Respiratory Distress SyndromeNeutrophil ActivationSepsisSedation
S
Stephanie R. Brown
Department of Pediatrics, Emory University School of Medicine
Amit Shah
Amit Shah
Associate Professor, Emory University
CardiologyEpidemiology
C
Craig Jabaley
Department of Anesthesiology, Emory University School of Medicine
T
Tim Buchman
Department of Surgery, Emory University School of Medicine
S
Sivasubramanium V Bhavani
Department of Medicine, Emory University School of Medicine
R
Randall J. Lee
School of Medicine, University of California, San Francisco
X
Xiao Hu
Nell Hodgson Woodruff School of Nursing, Emory University