Simulation-Based Pretraining and Domain Adaptation for Astronomical Time Series with Minimal Labeled Data

📅 2025-10-14
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Astronomical time-series analysis is severely constrained by the scarcity of real labeled data. To address this, we propose a pretraining–domain adaptation framework leveraging multi-survey synthetic time-series data. Our method jointly employs enhanced contrastive learning and an adversarial classifier to learn domain-invariant temporal representations. It enables zero-shot transfer across instruments (e.g., ZTF → LSST/Kepler) and astrophysical phenomena, achieving high-performance generalization on unseen tasks using only ZTF-labeled data. Evaluated on three core tasks—classification, photometric redshift estimation, and anomaly detection—our approach achieves substantial improvements over baselines after fine-tuning with minimal real annotations. Results demonstrate strong cross-domain generalization capability and practical deployability in real-world astronomical applications.

Technology Category

Application Category

📝 Abstract
Astronomical time-series analysis faces a critical limitation: the scarcity of labeled observational data. We present a pre-training approach that leverages simulations, significantly reducing the need for labeled examples from real observations. Our models, trained on simulated data from multiple astronomical surveys (ZTF and LSST), learn generalizable representations that transfer effectively to downstream tasks. Using classifier-based architectures enhanced with contrastive and adversarial objectives, we create domain-agnostic models that demonstrate substantial performance improvements over baseline methods in classification, redshift estimation, and anomaly detection when fine-tuned with minimal real data. Remarkably, our models exhibit effective zero-shot transfer capabilities, achieving comparable performance on future telescope (LSST) simulations when trained solely on existing telescope (ZTF) data. Furthermore, they generalize to very different astronomical phenomena (namely variable stars from NASA's extit{Kepler} telescope) despite being trained on transient events, demonstrating cross-domain capabilities. Our approach provides a practical solution for building general models when labeled data is scarce, but domain knowledge can be encoded in simulations.
Problem

Research questions and friction points this paper is trying to address.

Addressing scarcity of labeled astronomical time-series data
Enabling domain adaptation across different telescope surveys
Building generalizable models using simulation-based pretraining techniques
Innovation

Methods, ideas, or system contributions that make the work stand out.

Leverages simulations for pretraining with minimal labeled data
Uses contrastive and adversarial objectives for domain-agnostic models
Enables cross-domain transfer between different astronomical phenomena
🔎 Similar Papers
No similar papers found.
R
Rithwik Gupta
Massachusetts Institute of Technology, Cambridge, MA 02139, USA
Daniel Muthukrishna
Daniel Muthukrishna
Massachusetts Institute of Technology, University of Cambridge, University of Queensland, Australian
AstronomyCosmologyDark EnergySupernovaeMachine Learning
J
Jeroen Audenaert
Massachusetts Institute of Technology, Cambridge, MA 02139, USA