🤖 AI Summary
Astronomical variable-star light curves present domain-specific challenges—including irregular sampling, heteroscedastic noise, and sparse observations—that hinder the generalization of time-series foundation models (TSFMs).
Method: We introduce StarEmbed, the first publicly available benchmark specifically designed for variable-star light curves, supporting zero-shot transfer across clustering, classification, and anomaly detection tasks. Built upon multi-band data from the Zwicky Transient Facility, it enables rigorous evaluation of TSFMs—including MOIRAI, Chronos, and Astromer—against handcrafted astrophysical features.
Contribution/Results: Our systematic assessment reveals that Chronos significantly outperforms traditional astrophysical baselines on out-of-distribution source detection and other zero-shot tasks, demonstrating strong generalization to astronomical time series at sub-second cadence. This work establishes the viability of generic TSFMs for time-domain astronomy and advances the field toward foundation-model-driven paradigms.
📝 Abstract
Time series foundation models (TSFMs) are increasingly being adopted as highly-capable general-purpose time series representation learners. Although their training corpora are vast, they exclude astronomical time series data. Observations of stars produce peta-scale time series with unique challenges including irregular sampling and heteroskedasticity. We introduce StarEmbed, the first public benchmark for rigorous and standardized evaluation of state-of-the-art TSFMs on stellar time series observations (``light curves''). We benchmark on three scientifically-motivated downstream tasks: unsupervised clustering, supervised classification, and out-of-distribution source detection. StarEmbed integrates a catalog of expert-vetted labels with multi-variate light curves from the Zwicky Transient Facility, yielding ~40k hand-labeled light curves spread across seven astrophysical classes. We evaluate the zero-shot representation capabilities of three TSFMs (MOIRAI, Chronos, Chronos-Bolt) and a domain-specific transformer (Astromer) against handcrafted feature extraction, the long-standing baseline in the astrophysics literature. Our results demonstrate that these TSFMs, especially the Chronos models, which are trained on data completely unlike the astronomical observations, can outperform established astrophysics-specific baselines in some tasks and effectively generalize to entirely new data. In particular, TSFMs deliver state-of-the-art performance on our out-of-distribution source detection benchmark. With the first benchmark of TSFMs on astronomical time series data, we test the limits of their generalization and motivate a paradigm shift in time-domain astronomy from using task-specific, fully supervised pipelines toward adopting generic foundation model representations for the analysis of peta-scale datasets from forthcoming observatories.