🤖 AI Summary
In time-domain surveys, stellar variability classification suffers from poor model generalization due to training data biases—particularly class imbalance and insufficient coverage of the physical parameter space.
Method: We propose a physics-guided self-regulating CNN framework that tightly couples a classifier with a physics-enhanced variational autoencoder (VAE). The VAE’s latent space is constrained by six-dimensional Gaia DR3 astrophysical parameters, enabling physically consistent light-curve synthesis and multi-parameter-space resampling. Classifier-generator co-training enables dynamic, self-regulated correction of data biases.
Results: Experiments demonstrate statistically significant improvements in classification accuracy across diverse bias scenarios (p < 0.01), markedly reducing dependence on the training distribution and substantially enhancing cross-survey generalization capability.
📝 Abstract
Over the last two decades, machine learning models have been widely applied and have proven effective in classifying variable stars, particularly with the adoption of deep learning architectures such as convolutional neural networks, recurrent neural networks, and transformer models. While these models have achieved high accuracy, they require high-quality, representative data and a large number of labelled samples for each star type to generalise well, which can be challenging in time-domain surveys. This challenge often leads to models learning and reinforcing biases inherent in the training data, an issue that is not easily detectable when validation is performed on subsamples from the same catalogue. The problem of biases in variable star data has been largely overlooked, and a definitive solution has yet to be established. In this paper, we propose a new approach to improve the reliability of classifiers in variable star classification by introducing a self-regulated training process. This process utilises synthetic samples generated by a physics-enhanced latent space variational autoencoder, incorporating six physical parameters from Gaia Data Release 3. Our method features a dynamic interaction between a classifier and a generative model, where the generative model produces ad-hoc synthetic light curves to reduce confusion during classifier training and populate underrepresented regions in the physical parameter space. Experiments conducted under various scenarios demonstrate that our self-regulated training approach outperforms traditional training methods for classifying variable stars on biased datasets, showing statistically significant improvements.