SciHorizon-DataEVA: An Agentic System for AI-Readiness Evaluation of Heterogeneous Scientific Data

📅 2026-04-29
📈 Citations: 0
Influential: 0
📄 PDF

career value

241K/year
🤖 AI Summary
This study addresses the challenge in AI4Science of lacking scalable and systematic mechanisms to evaluate the AI-readiness of heterogeneous scientific data. The authors propose SciHorizon-DataEVA, a novel evaluation framework grounded in the four-dimensional Sci-TQA2 principles—governance trustworthiness, data quality, AI compatibility, and scientific adaptability—which decomposes AI-readiness into measurable atomic elements. The framework employs a knowledge-enhanced, self-correcting multi-agent collaborative workflow, featuring lightweight data profiling, applicability-aware metric activation, and a tool-centric adaptive validation mechanism. This approach enables fine-grained, actionable, and scalable assessment across diverse scientific domains. Experimental results demonstrate that the system achieves strong generalizability and reliability on multiple scientific datasets.
📝 Abstract
AI-for-Science (AI4Science) is increasingly transforming scientific discovery by embedding machine learning models into prediction, simulation, and hypothesis generation workflows across domains. However, the effectiveness of these models is fundamentally constrained by the AI-readiness of scientific data, for which no scalable and systematic evaluation mechanism currently exists. In this work, we propose SciHorizon-DataEVA, a novel agentic system to scalable AI-readiness evaluation of heterogeneous scientific data. At the evaluation-criteria level, we introduce the Sci-TQA2 principles, which organize AI-readiness into four complementary dimensions: Governance Trustworthiness, Data Quality, AI Compatibility, and Scientific Adaptability. Each dimension is decomposed into measurable atomic elements that enable fine-grained and executable assessment. To operationalize these principles at scale, we develop Sci-TQA2-Eval, a hierarchical multi-agent evaluation approach orchestrated through a directed, cyclic workflow. Our Sci-TQA2-Eval dynamically constructs dataset-aware evaluation specifications by combining lightweight dataset profiling, applicability-aware metric activation, and knowledge-augmented planning grounded in domain constraints and dataset-paper signals. These specifications are executed through an adaptive, tool-centric evaluation mechanism with built-in verification and self-correction, enabling scalable and reliable assessment across heterogeneous scientific data. Extensive experiments on scientific datasets spanning multiple domains demonstrate the effectiveness and generality of SciHorizon-DataEVA for principled AI-readiness evaluation.
Problem

Research questions and friction points this paper is trying to address.

AI-readiness
scientific data
heterogeneous data
evaluation framework
AI4Science
Innovation

Methods, ideas, or system contributions that make the work stand out.

AI-readiness evaluation
agentic system
Sci-TQA2 principles
heterogeneous scientific data
multi-agent evaluation
D
Dianyu Liu
SciHorizon Team, Computer Network Information Center, Chinese Academy of Sciences, Beijing, China
Chuan Qin
Chuan Qin
CNIC, Chinese Academy of Sciences
Knowledge ComputingRepresentation Learning
Xi Chen
Xi Chen
Professor, Institute of Atmospheric Physics, Chinese Academy of Sciences
computational fluid dynamicsgeophysical fluid dynamicsdynamical corenumerical weather prediction
Xiaohan Li
Xiaohan Li
Walmart Inc.
Data MiningRecommender systemMedical AI
W
Wenxi Xu
SciHorizon Team, Computer Network Information Center, Chinese Academy of Sciences, Beijing, China
Y
Yuyang Wang
SciHorizon Team, Computer Network Information Center, Chinese Academy of Sciences, Beijing, China
X
Xin Chen
SciHorizon Team, Computer Network Information Center, Chinese Academy of Sciences, Beijing, China
Yuanchun Zhou
Yuanchun Zhou
Computer Network Information Center,CAS
Data MiningBig Data Analysis
H
Hengshu Zhu
SciHorizon Team, Computer Network Information Center, Chinese Academy of Sciences, Beijing, China