🤖 AI Summary
Astronomical time-series analysis is severely constrained by the scarcity of real labeled data. To address this, we propose a pretraining–domain adaptation framework leveraging multi-survey synthetic time-series data. Our method jointly employs enhanced contrastive learning and an adversarial classifier to learn domain-invariant temporal representations. It enables zero-shot transfer across instruments (e.g., ZTF → LSST/Kepler) and astrophysical phenomena, achieving high-performance generalization on unseen tasks using only ZTF-labeled data. Evaluated on three core tasks—classification, photometric redshift estimation, and anomaly detection—our approach achieves substantial improvements over baselines after fine-tuning with minimal real annotations. Results demonstrate strong cross-domain generalization capability and practical deployability in real-world astronomical applications.
📝 Abstract
Astronomical time-series analysis faces a critical limitation: the scarcity of labeled observational data. We present a pre-training approach that leverages simulations, significantly reducing the need for labeled examples from real observations. Our models, trained on simulated data from multiple astronomical surveys (ZTF and LSST), learn generalizable representations that transfer effectively to downstream tasks. Using classifier-based architectures enhanced with contrastive and adversarial objectives, we create domain-agnostic models that demonstrate substantial performance improvements over baseline methods in classification, redshift estimation, and anomaly detection when fine-tuned with minimal real data. Remarkably, our models exhibit effective zero-shot transfer capabilities, achieving comparable performance on future telescope (LSST) simulations when trained solely on existing telescope (ZTF) data. Furthermore, they generalize to very different astronomical phenomena (namely variable stars from NASA's extit{Kepler} telescope) despite being trained on transient events, demonstrating cross-domain capabilities. Our approach provides a practical solution for building general models when labeled data is scarce, but domain knowledge can be encoded in simulations.