🤖 AI Summary
Achieving carbon neutrality demands intelligent energy systems powered by large language models (LLMs) that possess domain-specific knowledge and awareness of physical constraints—capabilities absent in general-purpose LLMs due to their lack of energy-domain expertise and engineering alignment. To address this gap, we propose Helios, the first domain-specialized LLM for intelligent energy systems. We introduce Enersys, a novel multi-agent collaborative data engineering framework that systematically generates three high-quality resources: EnerBase (a domain-knowledge-enhanced knowledge base), EnerInstruct (an instruction-tuning dataset), and EnerReinforce (an RLHF alignment dataset grounded in energy physics and standards). We further release EnerBench, the first comprehensive evaluation benchmark for intelligent energy LLMs. Through domain-knowledge-augmented pretraining, supervised fine-tuning, and physics-informed RLHF, Helios achieves substantial improvements over general-purpose baselines in energy knowledge comprehension, engineering task accuracy, and compliance with industry standards.
📝 Abstract
In the global drive toward carbon neutrality, deeply coordinated smart energy systems underpin industrial transformation. However, the interdisciplinary, fragmented, and fast-evolving expertise in this domain prevents general-purpose LLMs, which lack domain knowledge and physical-constraint awareness, from delivering precise engineering-aligned inference and generation. To address these challenges, we introduce Helios, a large language model tailored to the smart energy domain, together with a comprehensive suite of resources to advance LLM research in this field. Specifically, we develop Enersys, a multi-agent collaborative framework for end-to-end dataset construction, through which we produce: (1) a smart energy knowledge base, EnerBase, to enrich the model's foundational expertise; (2) an instruction fine-tuning dataset, EnerInstruct, to strengthen performance on domain-specific downstream tasks; and (3) an RLHF dataset, EnerReinforce, to align the model with human preferences and industry standards. Leveraging these resources, Helios undergoes large-scale pretraining, SFT, and RLHF. We also release EnerBench, a benchmark for evaluating LLMs in smart energy scenarios, and demonstrate that our approach significantly enhances domain knowledge mastery, task execution accuracy, and alignment with human preferences.