🤖 AI Summary
This study addresses the limitations of general-purpose large language models in space weather and heliophysics, where they lack domain-specific expertise and pedagogical explanatory capabilities, hindering their effectiveness in scientific education. To overcome this, the authors present the first large language model built upon LLaMA-3 that integrates domain-adaptive pretraining with instruction tuning explicitly designed for educational purposes. The model is trained on a curated corpus of scientific literature and question-answer pairs generated by GPT-4 and refined by Grok-3, all crafted in a student-friendly narrative style. Evaluated in a zero-shot setting, the model significantly outperforms generic counterparts and matches the explanatory quality of instruction-tuned baselines. Preliminary user testing indicates that its outputs are clearer and more comprehensible, striking an effective balance between scientific accuracy and pedagogical accessibility.
📝 Abstract
Solar activity, including solar flares, coronal mass ejections (CMEs), and geomagnetic storms, can significantly impact satellites, aviation, power grids, data centers, and space missions. Extreme solar events can cause substantial economic damage with limited advance warning, underscoring the importance of early-warning systems, accurate forecasting, and effective education in space science. Although large language models (LLMs) perform well on general tasks, they often lack domain-specific knowledge and pedagogical capability to clearly explain complex space science concepts. We introduce SolarGPT-QA, a question answering system based on a domain-adapted large language model built on the LLaMA-3 base model. The model is trained using scientific literature and large-scale question-answer data generated with GPT-4 and refined using Grok-3 in a student-friendly storytelling style. Human pairwise evaluations show that SolarGPT-QA outperforms general-purpose models in zero-shot settings and achieves competitive performance compared to instruction-tuned models for educational explanations in space weather and heliophysics. A small pilot student comprehension study further suggests improved clarity and accessibility of the generated explanations. Ablation experiments indicate that combining domain-adaptive pretraining with pedagogical fine-tuning is important for balancing scientific accuracy and educational effectiveness. This work represents an initial step toward a broader SolarGPT framework for space science education and forecasting.