PDEAgent-Bench: A Multi-Metric, Multi-Library Benchmark for PDE Solver Generation

📅 2026-05-10
📈 Citations: 0
Influential: 0
📄 PDF

career value

217K/year
🤖 AI Summary
This work addresses the absence of a standardized benchmark for evaluating code generation targeting partial differential equation (PDE) solvers, particularly with respect to numerical accuracy, computational efficiency, and compatibility with mainstream finite element libraries. To bridge this gap, the authors introduce the first multi-metric, multi-library benchmark for PDE solver generation, comprising 645 structured instances spanning six mathematical problem types and eleven PDE classes. The benchmark supports three major finite element frameworks—DOLFINx, Firedrake, and deal.II—and incorporates a staged evaluation framework that holistically assesses code executability, numerical correctness, and performance. Experimental results demonstrate that while current large language models can produce executable code, their success rate drops substantially when stringent accuracy and efficiency requirements are imposed, thereby underscoring the necessity and effectiveness of the proposed benchmark in advancing reliable and efficient automated PDE solver generation.
📝 Abstract
PDE-to-solver code generation aims to automatically synthesize executable numerical solvers from partial differential equation (PDE) specifications. This task requires not only understanding the mathematical structure of PDEs, but also selecting appropriate discretization schemes and solver configurations, and correctly implementing the resulting formulations in finite-element method (FEM) libraries. Existing code generation benchmarks mainly evaluate syntactic correctness, or success on predefined test cases. To our knowledge, there is currently no publicly available benchmark specifically for PDE-to-solver code generation, and general-purpose code benchmarks do not fully capture the unique challenges of numerical PDE solution, such as ensuring solver accuracy, efficiency, and compatibility with professional FEM libraries. We introduce PDEAgent-Bench, to the best of our knowledge, the first multi-metric, multi-library benchmark for PDE-to-solver code generation. PDEAgent-Bench contains 645 instances across 6 mathematical categories and 11 PDE families, with common FEM libraries for DOLFINx, Firedrake, and deal.II. Each instance provides an agent-facing problem specification, a reference solution on a prescribed evaluation grid, and case-specific accuracy and runtime targets. PDEAgent-Bench adopts a staged evaluation framework in which generated solvers must sequentially pass executability, numerical accuracy, and computational efficiency checks. Experiments with representative LLMs and code agents show that models can often produce runnable code, but their pass rate drops substantially once accuracy and efficiency requirements are enforced. These results indicate that current agents remain limited in producing numerically reliable and efficient PDE solvers, and that PDEAgent-Bench provides a reproducible testbed grounded in the practical requirements of numerical PDE solving.
Problem

Research questions and friction points this paper is trying to address.

PDE-to-solver code generation
numerical PDE solving
finite-element method
code generation benchmark
solver accuracy
Innovation

Methods, ideas, or system contributions that make the work stand out.

PDE-to-solver code generation
multi-metric benchmark
finite element method (FEM)
numerical accuracy
computational efficiency
Z
Zhen Hang
University of Science and Technology of China
Y
Yushan Yashengjiang
University of Science and Technology of China
J
Junhui Li
Beijing University of Posts and Telecommunications
H
Huanshuo Dong
University of Science and Technology of China,Tencent
Yang Wei
Yang Wei
Chongqing University of Posts and Telecommunications
adversarial attackimage forgery detectionimage processing
Z
Zhezheng Hao
Tencent,Zhejiang University
J
Jiangtao Ma
National University of Singapore
S
Songlin Bai
Alibaba Group
H
Haozhong Kai
Tsinghua University
X
Xihang Yue
Zhejiang University
G
Gangzong Si
University of Science and Technology of China
D
Dongming Jiang
University of Texas at Dallas
Chao Yao
Chao Yao
Northwestern polytechnical university
Z
Zhanhua Hu
Rice University
J
Jiangqing Zhang
Shanghai Jiao Tong University
Pengwei Liu
Pengwei Liu
PhD candidate, Zhejiang University
Physics-informed deep learning
Y
Yaomin Shen
Zhejiang University
Xingyu Ren
Xingyu Ren
Ph.D. graduate, Shanghai Jiao Tong University
Face ModelingGenerative AI
Lei Liu
Lei Liu
Anhui University of Science & Technology
CV
Zikang Xu
Zikang Xu
Institute of Artificial Intelligence, Hefei comprehensive national science center
Algorithm FairnessMedical Image AnalysisMEG data analysis
Han Li
Han Li
Computer Aided Medical Procedures (CAMP), Technische Universitaet Muenchen (TUM).
medical AI
Qingsong Yao
Qingsong Yao
Stanford University | ICT, CAS
Medical Image ComputingMedical Image Analysis
Hande Dong
Hande Dong
Tencent
machine learningdata miningNLP
H
Hong Wang
Tencent