🤖 AI Summary
A critical bottleneck in solar physics and space weather forecasting is the lack of high-resolution, machine learning–ready datasets. Method: We construct the first standardized heliophysics dataset spanning a full solar cycle (May 2010–July 2024), derived from SDO/AIA and HMI observations. We introduce a unified preprocessing pipeline—including attitude correction, orbital compensation, exposure normalization, and instrument degradation modeling—to ensure spatiotemporal consistency and physical interpretability. The dataset integrates multi-wavelength EUV imagery and vector magnetograms, and provides benchmark subsets for active region segmentation, solar flare prediction, and coronal magnetic field extrapolation. Contribution/Results: This work establishes the first reproducible, comparable, task-driven data benchmark explicitly designed for AI research in solar physics. It significantly improves model development efficiency and evaluation consistency, thereby advancing the paradigm of intelligent space weather forecasting.
📝 Abstract
This paper introduces a high resolution, machine learning-ready heliophysics dataset derived from NASA's Solar Dynamics Observatory (SDO), specifically designed to advance machine learning (ML) applications in solar physics and space weather forecasting. The dataset includes processed imagery from the Atmospheric Imaging Assembly (AIA) and Helioseismic and Magnetic Imager (HMI), spanning a solar cycle from May 2010 to July 2024. To ensure suitability for ML tasks, the data has been preprocessed, including correction of spacecraft roll angles, orbital adjustments, exposure normalization, and degradation compensation. We also provide auxiliary application benchmark datasets complementing the core SDO dataset. These provide benchmark applications for central heliophysics and space weather tasks such as active region segmentation, active region emergence forecasting, coronal field extrapolation, solar flare prediction, solar EUV spectra prediction, and solar wind speed estimation. By establishing a unified, standardized data collection, this dataset aims to facilitate benchmarking, enhance reproducibility, and accelerate the development of AI-driven models for critical space weather prediction tasks, bridging gaps between solar physics, machine learning, and operational forecasting.