HepScript: A Dual-Use DSL for Human-AI Collaborative Data Analysis Workflows in High-Energy Physics

📅 2026-05-02
📈 Citations: 0
Influential: 0
📄 PDF

career value

202K/year
🤖 AI Summary
High-energy physics data analysis relies on complex, experiment-specific codebases that are challenging to automate with large language models. To address this, this work proposes HepScript—a dual-purpose domain-specific language (DSL) designed for high-energy physics that balances human readability with AI generatability, serving as a shared interface for human–AI collaboration. HepScript automatically translates high-level analytical intent into production-ready code through syntactic constraints and code abstraction, substantially lowering the barrier to automation. Notably, it enables, for the first time, the direct generation of complete analysis workflows from scientific literature. Evaluated on the BESIII experiment, the approach reduces manual coding effort by 93%, with AI agents successfully generating core analysis pipelines at a 95% success rate.
📝 Abstract
The escalating data scale in High-Energy Physics (HEP) fuels a growing aspiration for higher analytical efficiency. While Large Language Models (LLMs) offer a path toward automation via agentic AI, they struggle with complex scientific workflows that require deep domain knowledge and are tightly coupled to experiment-specific codebases. To address this, we introduce a methodology centered on HepScript, a dual-use Domain-Specific Language (DSL) for HEP data analysis workflows. HepScript serves as a shared formal interface, abstracting HEP analysis logic into a constrained syntax that is both intuitive for human experts and reliably generable by AI agents. First developed for the Beijing Spectrometer III (BESIII) experiment, HepScript hides the complexity of the underlying software stack, translating high-level analysis intent into low-level, production-ready code. In our case studies, this abstraction reduces the required human-written code by 93\%. Crucially, HepScript's constrained grammar defines a tractable action space, enabling AI agents to autonomously generate executable specifications for core analysis stages directly from published literature with a 95\% success rate. Our work demonstrates a scalable pathway toward human-AI collaborative systems, where a formally specified DSL acts as an unambiguous translation layer between human expertise, AI automation, and production environment, rendering previously intractable automation problems solvable.
Problem

Research questions and friction points this paper is trying to address.

High-Energy Physics
Large Language Models
scientific workflows
domain-specific language
human-AI collaboration
Innovation

Methods, ideas, or system contributions that make the work stand out.

Domain-Specific Language
Human-AI Collaboration
High-Energy Physics
LLM-based Automation
Workflow Abstraction
J
Junkun Jiao
Jilin University, Changchun, Jilin, China
Tong Liu
Tong Liu
Institute of Information Engineering, Chinese Academy of Sciences
AI SecuritySoftware Security
Ke Li
Ke Li
Institute of Biomedical Engineering, Shandong University
Motor controlbiomechanicsneurophysiologynonlinear dynamicscardiovascular system evaluation
W
Weimin Song
Jilin University, Changchun, Jilin, China
Y
Yipu Liao
Institute of High Energy Physics, CAS, Beijing, China; University of Chinese Academy of Sciences, Beijing, China
B
Bolun Zhang
Institute of High Energy Physics, CAS, Beijing, China
B
Beijiang Liu
Institute of High Energy Physics, CAS, Beijing, China; University of Chinese Academy of Sciences, Beijing, China
C
Chang-Zheng Yuan
Institute of High Energy Physics, CAS, Beijing, China; University of Chinese Academy of Sciences, Beijing, China
Y
Yue Sun
Institute of High Energy Physics, CAS, Beijing, China; University of Chinese Academy of Sciences, Beijing, China