SolARED: Solar Active Region Emergence Dataset for Machine Learning Aided Predictions

📅 2026-01-19
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To enhance early prediction of solar eruptive activity, this work presents the SolARED dataset, which focuses on the pre-emergence evolution of solar active regions (ARs). Leveraging multi-channel observations from SDO/HMI, the dataset integrates time-series measurements of multiple physical parameters—including acoustic power, magnetic flux, and continuum intensity—through systematic application of image remapping, target tracking, and spatial binning techniques. It provides a standardized, machine learning–ready collection covering 50 major active regions and their surrounding areas from 2010 to 2023. SolARED is the first structured dataset to capture the multi-parameter evolution prior to AR emergence, addressing a critical gap in data for early-warning solar forecasting. The dataset has been publicly released via an interactive platform to support operational space weather prediction research.

Technology Category

Application Category

📝 Abstract
The development of accurate forecasts of solar eruptive activity has become increasingly important for preventing potential impacts on space technologies and exploration. Therefore, it is crucial to detect Active Regions (ARs) before they start forming on the solar surface. This will enable the development of early-warning capabilities for upcoming space weather disturbances. For this reason, we prepared the Solar Active Region Emergence Dataset (SolARED). The dataset is derived from full-disk maps of the Doppler velocity, magnetic field, and continuum intensity, obtained by the Helioseismic and Magnetic Imager (HMI) onboard the Solar Dynamics Observatory (SDO). SolARED includes time series of remapped, tracked, and binned data that characterize the evolution of acoustic power of solar oscillations, unsigned magnetic flux, and continuum intensity for 50 large ARs before, during, and after their emergence on the solar surface, as well as surrounding areas observed on the solar disc between 2010 and 2023. The resulting ML-ready SolARED dataset is designed to support enhancements of predictive capabilities, enabling the development of operational forecasts for the emergence of active regions. The SolARED dataset is available at https://sun.njit.edu/sarportal/, through an interactive visualization web application.
Problem

Research questions and friction points this paper is trying to address.

solar active regions
early detection
space weather forecasting
solar eruption prediction
machine learning
Innovation

Methods, ideas, or system contributions that make the work stand out.

Solar Active Region
Machine Learning Dataset
Helioseismic and Magnetic Imager
Space Weather Forecasting
Acoustic Power
🔎 Similar Papers
No similar papers found.
S
S. Kasapis
Department of Astrophysical Sciences, Princeton University, NJ, USA; Computational Physics Branch, NASA Ames Research Center, Moffett Field, CA, USA
E
Eren Dogan
Department of Data Science, New Jersey Institute of Technology, Newark, NJ, USA
I
I. Kitiashvili
Computational Physics Branch, NASA Ames Research Center, Moffett Field, CA, USA
A
Alexander G. Kosovichev
Computational Physics Branch, NASA Ames Research Center, Moffett Field, CA, USA; Center for Computational Heliophysics, Department of Physics, New Jersey Institute of Technology, Newark, NJ, USA
J
J. Stefan
Center for Computational Heliophysics, Department of Physics, New Jersey Institute of Technology, Newark, NJ, USA
J
Jake D. Butler
Computational Physics Branch, NASA Ames Research Center, Moffett Field, CA, USA; InuTeq, LLC, MD, USA
J
Jonas Tirona
Department of Data Science, New Jersey Institute of Technology, Newark, NJ, USA
S
Sarang Patil
Department of Data Science, New Jersey Institute of Technology, Newark, NJ, USA
Mengjia Xu
Mengjia Xu
Assistant Professor, NJIT; CBMM, MIT; Applied Math, Brown University
Machine LearningGraph Machine LearningLLMsManifold LearningBrain fMRI/MEG