SciHorizon: Benchmarking AI-for-Science Readiness from Scientific Data to Large Language Models

๐Ÿ“… 2025-03-12
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
A systematic evaluation framework for AI-for-Science (AI4Science) that jointly addresses data quality and model capability remains lacking. To bridge this gap, we propose the first dual-dimensional assessment framework: (1) a four-dimensional scientific data AI-readiness evaluation systemโ€”covering data quality, FAIRness, interpretability, and regulatory compliance; and (2) a 16-dimensional interdisciplinary scientific competence benchmark for large language models (LLMs), grounded in knowledge, understanding, reasoning, multimodality, and value alignment. Leveraging multidimensional metric modeling, scientific literature analysis, benchmark construction, and large-scale empirical evaluation, we release curated AI-ready datasets for Earth and Life Sciences and comprehensively assess over 20 open- and closed-source LLMs. All datasets, evaluation tools, and results are publicly available at scihoizon.cn.

Technology Category

Application Category

๐Ÿ“ Abstract
In recent years, the rapid advancement of Artificial Intelligence (AI) technologies, particularly Large Language Models (LLMs), has revolutionized the paradigm of scientific discovery, establishing AI-for-Science (AI4Science) as a dynamic and evolving field. However, there is still a lack of an effective framework for the overall assessment of AI4Science, particularly from a holistic perspective on data quality and model capability. Therefore, in this study, we propose SciHorizon, a comprehensive assessment framework designed to benchmark the readiness of AI4Science from both scientific data and LLM perspectives. First, we introduce a generalizable framework for assessing AI-ready scientific data, encompassing four key dimensions: Quality, FAIRness, Explainability, and Compliance which are subdivided into 15 sub-dimensions. Drawing on data resource papers published between 2018 and 2023 in peer-reviewed journals, we present recommendation lists of AI-ready datasets for both Earth and Life Sciences, making a novel and original contribution to the field. Concurrently, to assess the capabilities of LLMs across multiple scientific disciplines, we establish 16 assessment dimensions based on five core indicators Knowledge, Understanding, Reasoning, Multimodality, and Values spanning Mathematics, Physics, Chemistry, Life Sciences, and Earth and Space Sciences. Using the developed benchmark datasets, we have conducted a comprehensive evaluation of over 20 representative open-source and closed source LLMs. All the results are publicly available and can be accessed online at www.scihorizon.cn/en.
Problem

Research questions and friction points this paper is trying to address.

Lack of framework for AI4Science assessment
Need for evaluating AI-ready scientific data quality
Assessment of LLMs across multiple scientific disciplines
Innovation

Methods, ideas, or system contributions that make the work stand out.

Develops SciHorizon for AI4Science readiness assessment
Assesses AI-ready data via Quality, FAIRness, Explainability, Compliance
Evaluates LLMs across 16 dimensions in multiple sciences
๐Ÿ”Ž Similar Papers
No similar papers found.
C
Chuan Qin
Computer Network Information Center, Chinese Academy of Sciences.
X
Xin Chen
Computer Network Information Center, Chinese Academy of Sciences.
Chengrui Wang
Chengrui Wang
Alibaba Group
Computer Vision
P
Pengmin Wu
Computer Network Information Center, Chinese Academy of Sciences.
X
Xi Chen
Computer Network Information Center, Chinese Academy of Sciences., University of Science and Technology of China.
Y
Yihang Cheng
Computer Network Information Center, Chinese Academy of Sciences.
Jingyi Zhao
Jingyi Zhao
Shenzhen Research Institute of Big Data
Inventory RoutingStochastic ProgrammingLearning to OptimizeMeta-heuristic
M
Meng Xiao
Computer Network Information Center, Chinese Academy of Sciences.
X
Xiangchao Dong
Computer Network Information Center, Chinese Academy of Sciences.
Q
Qingqing Long
Computer Network Information Center, Chinese Academy of Sciences.
B
Boya Pan
Computer Network Information Center, Chinese Academy of Sciences.
H
Han Wu
Hefei University of Technology.
C
Chengzan Li
Computer Network Information Center, Chinese Academy of Sciences.
Yuanchun Zhou
Yuanchun Zhou
Computer Network Information Center,CAS
Data MiningBig Data Analysis
Hui Xiong
Hui Xiong
Senior Scientist, Candela Corporation
Ultrafast dynamicsatomic molecular physicsfree electron laser
H
Hengshu Zhu
Computer Network Information Center, Chinese Academy of Sciences.