FEABench: Evaluating Language Models on Multiphysics Reasoning Ability

📅 2025-04-08
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study evaluates the end-to-end capability of large language models (LLMs) in solving multiphysics engineering problems—a domain requiring rigorous numerical computation and domain-specific tool integration. Method: We introduce FEABench, the first benchmark tailored for finite element analysis (FEA), to systematically assess LLMs’ ability to interpret natural-language engineering specifications, invoke the COMSOL Multiphysics API for multiphysics simulation, and generate quantitative solutions. Our evaluation paradigm integrates API interaction, multi-step iterative refinement, and executable-code validation—enabling systematic assessment of reasoning robustness and operational reliability in real numerical solving. Contribution/Results: FEABench establishes a novel agent-centric evaluation framework for LLMs in scientific computing. Experimental results show that the best-performing strategy achieves an 88% success rate in generating syntactically and semantically valid, executable API calls. This work bridges the gap between symbolic reasoning and high-fidelity numerical simulation, advancing LLMs toward engineering automation and autonomous scientific discovery.

Technology Category

Application Category

📝 Abstract
Building precise simulations of the real world and invoking numerical solvers to answer quantitative problems is an essential requirement in engineering and science. We present FEABench, a benchmark to evaluate the ability of large language models (LLMs) and LLM agents to simulate and solve physics, mathematics and engineering problems using finite element analysis (FEA). We introduce a comprehensive evaluation scheme to investigate the ability of LLMs to solve these problems end-to-end by reasoning over natural language problem descriptions and operating COMSOL Multiphysics$^circledR$, an FEA software, to compute the answers. We additionally design a language model agent equipped with the ability to interact with the software through its Application Programming Interface (API), examine its outputs and use tools to improve its solutions over multiple iterations. Our best performing strategy generates executable API calls 88% of the time. LLMs that can successfully interact with and operate FEA software to solve problems such as those in our benchmark would push the frontiers of automation in engineering. Acquiring this capability would augment LLMs' reasoning skills with the precision of numerical solvers and advance the development of autonomous systems that can tackle complex problems in the real world. The code is available at https://github.com/google/feabench
Problem

Research questions and friction points this paper is trying to address.

Evaluating LLMs on multiphysics simulation using FEA
Developing agents to operate FEA software via API
Enhancing LLM reasoning with numerical solver precision
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses finite element analysis for physics simulations
Integrates LLMs with COMSOL via API
Iterative tool usage improves solution accuracy
🔎 Similar Papers
No similar papers found.