Evaluating Hydro-Science and Engineering Knowledge of Large Language Models

📅 2025-12-03
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Large language models (LLMs) lack domain-specific evaluation benchmarks for hydrology and water resources engineering (Hydro-SE), hindering rigorous assessment of their domain expertise. Method: We introduce Hydro-SE Bench—the first comprehensive, expert-curated benchmark for Hydro-SE—comprising 4,000 multiple-choice questions across nine subdomains, systematically evaluating foundational knowledge, engineering application, and computational reasoning. The benchmark integrates natural/physical science principles, engineering practice, and interdisciplinary reasoning dimensions, and evaluates both leading commercial LLMs and small-parameter open-source models. Results: Commercial models achieve significantly higher accuracy (74–80%) than open-source counterparts (41–68%), yet all models exhibit consistent weaknesses in industry standards compliance and hydraulic structure analysis. This work fills a critical gap by establishing the first standardized Hydro-SE evaluation framework, precisely identifying capability bottlenecks, and providing a quantifiable foundation for domain-adapted model development and real-world deployment.

Technology Category

Application Category

📝 Abstract
Hydro-Science and Engineering (Hydro-SE) is a critical and irreplaceable domain that secures human water supply, generates clean hydropower energy, and mitigates flood and drought disasters. Featuring multiple engineering objectives, Hydro-SE is an inherently interdisciplinary domain that integrates scientific knowledge with engineering expertise. This integration necessitates extensive expert collaboration in decision-making, which poses difficulties for intelligence. With the rapid advancement of large language models (LLMs), their potential application in the Hydro-SE domain is being increasingly explored. However, the knowledge and application abilities of LLMs in Hydro-SE have not been sufficiently evaluated. To address this issue, we propose the Hydro-SE LLM evaluation benchmark (Hydro-SE Bench), which contains 4,000 multiple-choice questions. Hydro-SE Bench covers nine subfields and enables evaluation of LLMs in aspects of basic conceptual knowledge, engineering application ability, and reasoning and calculation ability. The evaluation results on Hydro-SE Bench show that the accuracy values vary among 0.74 to 0.80 for commercial LLMs, and among 0.41 to 0.68 for small-parameter LLMs. While LLMs perform well in subfields closely related to natural and physical sciences, they struggle with domain-specific knowledge such as industry standards and hydraulic structures. Model scaling mainly improves reasoning and calculation abilities, but there is still great potential for LLMs to better handle problems in practical engineering application. This study highlights the strengths and weaknesses of LLMs for Hydro-SE tasks, providing model developers with clear training targets and Hydro-SE researchers with practical guidance for applying LLMs.
Problem

Research questions and friction points this paper is trying to address.

Evaluates large language models' knowledge in hydro-science and engineering domains.
Assesses LLMs' ability to apply engineering concepts and solve practical problems.
Identifies gaps in LLMs' understanding of industry standards and hydraulic structures.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Proposed Hydro-SE Bench benchmark with 4000 multiple-choice questions
Evaluated LLMs across nine subfields and three ability aspects
Identified LLM strengths in sciences and weaknesses in engineering standards
🔎 Similar Papers
No similar papers found.
S
Shiruo Hu
State Key Laboratory of Hydro-Science and Engineering, Department of Hydraulic Engineering, Tsinghua University, Beijing, 100084, China.
W
Wenbo Shan
State Key Laboratory of Hydro-Science and Engineering, Department of Hydraulic Engineering, Tsinghua University, Beijing, 100084, China.
Y
Yingjia Li
State Key Laboratory of Hydro-Science and Engineering, Department of Hydraulic Engineering, Tsinghua University, Beijing, 100084, China.
Z
Zhiqi Wan
State Key Laboratory of Hydro-Science and Engineering, Department of Hydraulic Engineering, Tsinghua University, Beijing, 100084, China.
X
Xinpeng Yu
State Key Laboratory of Hydro-Science and Engineering, Department of Hydraulic Engineering, Tsinghua University, Beijing, 100084, China.
Yunjia Qi
Yunjia Qi
Tsinghua University
Large Language ModelsInformation ExtractionNatural Language Processing
Haotian Xia
Haotian Xia
Rice University
Natural Language ProcessingSports Analytics
Y
Yang Xiao
Zhipu AI, Beijing, 100084, China.
D
Dingxiao Liu
Zhipu AI, Beijing, 100084, China.
J
Jiaru Wang
State Key Laboratory of Hydro-Science and Engineering, Department of Hydraulic Engineering, Tsinghua University, Beijing, 100084, China.
C
Chenxu Gong
State Key Laboratory of Hydro-Science and Engineering, Department of Hydraulic Engineering, Tsinghua University, Beijing, 100084, China.
R
Ruixi Zhang
State Key Laboratory of Hydro-Science and Engineering, Department of Hydraulic Engineering, Tsinghua University, Beijing, 100084, China.
S
Shuyue Wu
State Key Laboratory of Hydro-Science and Engineering, Department of Hydraulic Engineering, Tsinghua University, Beijing, 100084, China.
S
Shibo Cui
State Key Laboratory of Hydro-Science and Engineering, Department of Hydraulic Engineering, Tsinghua University, Beijing, 100084, China.
C
Chee Hui Lai
State Key Laboratory of Hydro-Science and Engineering, Department of Hydraulic Engineering, Tsinghua University, Beijing, 100084, China.
W
Wei Luo
State Key Laboratory of Hydro-Science and Engineering, Department of Hydraulic Engineering, Tsinghua University, Beijing, 100084, China. and CHN Energy Dadu River Big Data Services Co., Ltd., Chengdu, 610041, Sichuan, China.
Y
Yubin He
CHN Energy Dadu River Big Data Services Co., Ltd., Chengdu, 610041, Sichuan, China.
B
Bin Xu
Department of Computer Science and Technology, Tsinghua University, Beijing, 100084, China.
J
Jianshi Zhao
State Key Laboratory of Hydro-Science and Engineering, Department of Hydraulic Engineering, Tsinghua University, Beijing, 100084, China.