Self-supervised Synthetic Pretraining for Inference of Stellar Mass Embedded in Dense Gas

📅 2025-10-28
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
In star-forming regions, dense gas obscuration and structural inhomogeneity severely bias conventional spherically symmetric dynamical mass estimates. To address this, we propose a novel self-supervised pretraining paradigm tailored for astrophysical imagery: large-scale Vision Transformer (ViT) pretraining within the DINOv2 framework using one million synthetic fractal images—enabling learning of physically semantically rich features without ground-truth annotations. After fine-tuning on limited high-resolution magnetohydrodynamic (MHD) simulation data, freezing the backbone yields competitive stellar mass regression performance—slightly surpassing fully supervised baselines. Principal component analysis (PCA) of learned features uncovers low-dimensional structures strongly correlated with physical quantities such as gas density and turbulent scale. Moreover, the representations support unsupervised spatial segmentation. This approach provides an interpretable, annotation-efficient pathway for stellar mass estimation in obscured environments.

Technology Category

Application Category

📝 Abstract
Stellar mass is a fundamental quantity that determines the properties and evolution of stars. However, estimating stellar masses in star-forming regions is challenging because young stars are obscured by dense gas and the regions are highly inhomogeneous, making spherical dynamical estimates unreliable. Supervised machine learning could link such complex structures to stellar mass, but it requires large, high-quality labeled datasets from high-resolution magneto-hydrodynamical (MHD) simulations, which are computationally expensive. We address this by pretraining a vision transformer on one million synthetic fractal images using the self-supervised framework DINOv2, and then applying the frozen model to limited high-resolution MHD simulations. Our results demonstrate that synthetic pretraining improves frozen-feature regression stellar mass predictions, with the pretrained model performing slightly better than a supervised model trained on the same limited simulations. Principal component analysis of the extracted features further reveals semantically meaningful structures, suggesting that the model enables unsupervised segmentation of star-forming regions without the need for labeled data or fine-tuning.
Problem

Research questions and friction points this paper is trying to address.

Estimating stellar mass in obscured star-forming regions with complex structures
Overcoming computational expense of high-resolution MHD simulation datasets
Enabling unsupervised segmentation of star-forming regions without labeled data
Innovation

Methods, ideas, or system contributions that make the work stand out.

Self-supervised pretraining on synthetic fractal images
Using frozen vision transformer for stellar mass regression
Enabling unsupervised segmentation without labeled data
🔎 Similar Papers
No similar papers found.
Keiya Hirashima
Keiya Hirashima
RIEKN Center for Interdisciplinary Theoretical and Mathematical Sciences
Machine learningHPCGalaxy formation and evolution
S
Shingo Nozaki
Department of Earth and Planetary Sciences, Kyushu University, Fukuoka, Japan
N
Naoto Harada
Department of Astronomy, University of Tokyo, Tokyo, Japan