Language Models Still Struggle to Zero-shot Reason about Time Series, EMNLP, 2024
BLADE: Benchmarking Language Model Agents for Data-Driven Science, EMNLP, 2024
Are Language Models Actually Useful for Time Series Forecasting?, NeurIPS [Spotlight], 2024
Transforming Wearable Data into Health Insights using Large Language Model Agents, Preprint, 2024
Homekit2020: A Benchmark for Time Series Classification on a Large Mobile Sensing Dataset with Laboratory Tested Ground Truth of Influenza Infections, CHIL, 2023
Self-supervised Pretraining and Transfer Learning Enable Flu and COVID-19 Predictions in Small Mobile Sensing Datasets, CHIL, 2023
CORAL: COde RepresentAtion Learning with Weakly-Supervised Transformers for Analyzing Data Analysis, EPJ Data Science, 2022
Globem dataset: Multi-year datasets for longitudinal human behavior modeling generalization, NeurIPS, 2022
MULTIVERSE: Mining Collective Data Science Knowledge from Code on the Web to Suggest Alternative Analysis Approaches, KDD, 2021
CrossCheck: Integrating self-report, behavioral sensing, and smartphone use to identify digital indicators of psychotic relapse, Psychiatric Rehabilitation Journal, 2017
CrossCheck: toward passive sensing and detection of mental health changes in people with schizophrenia, Ubicomp, 2016
Research Experience
Postdoctoral Researcher at Stanford Computer Science, working with Ludwig Schmidt on empirical evaluations of reasoning LLMs; Student Researcher at Google Research, ML Research Intern at Apple Health AI, and Data Scientist at HealthRhythms.
Education
PhD from the Paul G. Allen School of Computer Science & Engineering at the University of Washington, advised by Tim Althoff.
Background
Research Interests: Methods, datasets, and benchmarks for training and evaluating language models on time series data and code generation. Recently working on building better datasets for training reasoning LLMs.