CoDaS: AI Co-Data-Scientist for Biomarker Discovery via Wearable Sensors

📅 2026-04-16
📈 Citations: 0
Influential: 0
📄 PDF

career value

209K/year
🤖 AI Summary
This study addresses the challenge of systematically and reliably discovering clinically meaningful digital biomarkers from continuous physiological signals captured by wearable devices. To this end, we propose CoDaS, a multi-agent collaborative framework that, for the first time, integrates hypothesis generation, statistical analysis, adversarial validation, and literature-based knowledge reasoning into an iterative, human-supervised workflow. This approach enables automated biomarker discovery that is traceable, verifiable, and grounded in domain expertise. Evaluated on a cohort of 9,279 participants, CoDaS identified 41 candidate biomarkers associated with mental health and 25 linked to metabolic function. It significantly replicated the association between circadian rhythm instability and depression (ρ = 0.252 and 0.126 in two cohorts) and improved predictive performance for both depression and insulin resistance (ΔR² = 0.040 and 0.021, respectively).

Technology Category

Application Category

📝 Abstract
Scientific discovery in digital health requires converting continuous physiological signals from wearable devices into clinically actionable biomarkers. We introduce CoDaS (AI Co-Data-Scientist), a multi-agent system that structures biomarker discovery as an iterative process combining hypothesis generation, statistical analysis, adversarial validation, and literature-grounded reasoning with human oversight using large-scale wearable datasets. Across three cohorts totaling 9,279 participant-observations, CoDaS identified 41 candidate digital biomarkers for mental health and 25 for metabolic outcomes, each subjected to an internal validation battery spanning replication, stability, robustness, and discriminative power. Across two independent depression cohorts, CoDaS surfaced circadian instability-related features in both datasets, reflected in sleep duration variability (DWB, ρ= 0.252, p < 0.001) and sleep onset variability (GLOBEM, ρ= 0.126, p < 0.001). In a metabolic cohort, CoDaS derived a cardiovascular fitness index (steps/resting heart rate; ρ= -0.374, p < 0.001), and recovered established clinical associations, including the hepatic function ratio (AST/ALT; ρ= -0.375, p < 0.001), a known correlate of insulin resistance. Incorporating CoDaS-derived features alongside demographic variables led to modest but consistent improvements in predictive performance, with cross-validated ΔR^2 increases of 0.040 for depression and 0.021 for insulin resistance. These findings suggest that CoDaS enables systematic and traceable hypothesis generation and prioritization for biomarker discovery from large-scale wearable data.
Problem

Research questions and friction points this paper is trying to address.

digital biomarker
wearable sensors
biomarker discovery
mental health
metabolic outcomes
Innovation

Methods, ideas, or system contributions that make the work stand out.

multi-agent system
digital biomarker discovery
wearable sensors
adversarial validation
hypothesis generation
Yubin Kim
Yubin Kim
MIT
Health AIAI SafetyAgents
Salman Rahman
Salman Rahman
University of California Los Angeles
Machine LearningNatural Language ProcessingLanguage Modeling
Samuel Schmidgall
Samuel Schmidgall
Google DeepMind
AI AgentsLLM agentsLarge Language ModelsMedical AI
Chunjong Park
Chunjong Park
Google DeepMind
A
A. Ali Heydari
Google Research
A
Ahmed A. Metwally
Google Research
H
Hong Yu
Google Research
Xin Liu
Xin Liu
Google
Computer Networks and Distributed Systems
Xuhai Xu
Xuhai Xu
Assistant Professor, Columbia University | Google
Human-Computer InteractionUbiquitous ComputingHuman-Centered AImHealthHealth Informatics
Y
Yuzhe Yang
Google Research
M
Maxwell A. Xu
Google Research
Zhihan Zhang
Zhihan Zhang
PhD student, University of Notre Dame
Natural Language Processing
Cynthia Breazeal
Cynthia Breazeal
Professor Media Arts and Sciences, MIT Media Lab
Social RoboticsArtificial IntelligenceHuman-Computer InteractionAI Literacy
Tim Althoff
Tim Althoff
Associate Professor of Computer Science, University of Washington
Human AI InteractionNatural Language ProcessingBehavioral Data ScienceAI for Mental Health
P
Petar Sirkovic
Google Cloud AI
I
Ivor Rendulic
Google Cloud AI
A
Annalisa Pawlosky
Google Research
Nicolas Stroppa
Nicolas Stroppa
Google
Machine LearningNatural Language Processing
J
Juraj Gottweis
Google Cloud AI
Elahe Vedadi
Elahe Vedadi
Google DeepMind
AIDistributed ComputingInformation TheorySecure & Private Computing
Alan Karthikesalingam
Alan Karthikesalingam
Google Health
Artificial Intelligence in Healthcare
Pushmeet Kohli
Pushmeet Kohli
DeepMind
AI for ScienceMachine LearningAI SafetyComputer VisionProgram Synthesis
Vivek Natarajan
Vivek Natarajan
Research Lead, Google DeepMind
Deep LearningHealthcareComputer VisionNatural Language ProcessingArtificial Intelligence
M
Mark Malhotra
Google Research
Shwetak Patel
Shwetak Patel
University of Washington, Washington Research Foundation Endowed Professor, Computer Science
Ubiquitous ComputingHuman-Computer InteractionSensorsEmbedded Systems