Chain of Unit-Physics: A Primitive-Centric Approach to Scientific Code Synthesis

📅 2025-11-30

📈 Citations: 0

✨ Influential: 0

career value

240K/year

🤖 AI Summary

In scientific computing, automatically generating high-reliability code using large language models (LLMs) remains hindered by domain-specific data scarcity and the practical infeasibility of RLHF within small expert communities. To address this, we propose the first multi-agent code generation framework centered on *unit–physical consistency*: dimensional analysis and conservation laws are encoded as verifiable unit tests, enabling a primitive-centric collaborative system that suppresses syntactic hallucinations, numerical inaccuracies, and configuration fragility. Our method integrates open-source LLMs, chain-of-thought decoding, and multi-agent coordination to achieve end-to-end scientific code synthesis. Evaluated on combustion simulation tasks, the framework converges within 5–6 iterative refinement rounds. Generated code matches human-written implementations in accuracy (mean squared error: 3.1×10⁻³%), accelerates execution by 33.4%, improves memory efficiency by 30%, and maintains cost-effectiveness.

Technology Category

Application Category

📝 Abstract

Agentic large language models are proposed as autonomous code generators for scientific computing, yet their reliability in high-stakes problems remains unclear. Developing computational scientific software from natural-language queries remains challenging broadly due to (a) sparse representation of domain codes during training and (b) the limited feasibility of RLHF with a small expert community. To address these limitations, this work conceptualizes an inverse approach to code design, embodied in the Chain of Unit-Physics framework: a first-principles (or primitives)-centric, multi-agent system in which human expert knowledge is encoded as unit-physics tests that explicitly constrain code generation. The framework is evaluated on a nontrivial combustion task, used here as a representative benchmark for scientific problem with realistic physical constraints. Closed-weight systems and code-focused agentic variants fail to produce correct end-to-end solvers, despite tool and web access, exhibiting four recurrent error classes: interface (syntax/API) hallucinations, overconfident assumptions, numerical/physical incoherence, and configuration fragility. Open-weight models with chain-of-thought (CoT) decoding reduce interface errors but still yield incorrect solutions. On the benchmark task, the proposed framework converges within 5-6 iterations, matches the human-expert implementation (mean error of $3.1 imes10^{-3}$ %), with a $sim$33.4 % faster runtime and a $sim$30 % efficient memory usage at a cost comparable to mid-sized commercial APIs, yielding a practical template for physics-grounded scientific code generation. As datasets and models evolve, zero-shot code accuracy will improve; however, the Chain of Unit-Physics framework goes further by embedding first-principles analysis that is foundational to scientific codes.

Problem

Research questions and friction points this paper is trying to address.

Addresses unreliable AI-generated scientific code for high-stakes problems

Overcomes sparse domain code representation and limited expert feedback

Ensures physics-grounded code via unit tests and first-principles constraints

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-agent system with unit-physics tests for constraints

First-principles primitives-centric approach to code generation

Iterative framework converging to human-expert implementation accuracy

🔎 Similar Papers

No similar papers found.